# Auto-Fix CI Workflow Implementation Checkpoint ## Overview This document captures the learnings from implementing auto-fix CI workflows that allow Claude to automatically fix CI failures and post as claude[bot]. ## Journey Summary ### Initial Goal Create an auto-fix CI workflow similar to Cursor's implementation that: 1. Detects CI failures on PRs 2. Automatically triggers Claude to fix the issues 3. Creates branches with fixes 4. Posts PR comments as claude[bot] (not github-actions[bot]) ### Key Implementation Files #### 1. Auto-Fix Workflow **File**: `.github/workflows/auto-fix-ci-inline.yml` - Triggers on `workflow_run` event when CI fails - Creates fix branch - Collects failure logs - Calls Claude Code Action with `/fix-ci` slash command - Posts PR comment with fix branch link #### 2. Fix-CI Slash Command **File**: `.claude/commands/fix-ci.md` - Contains all instructions for analyzing and fixing CI failures - Handles test failures, type errors, linting issues - Commits and pushes fixes #### 3. Claude Code Action Changes (v1-dev branch) **Modified Files**: - `src/entrypoints/prepare.ts` - Exposes GitHub token as output - `action.yml` - Adds github_token output definition ## Critical Discoveries ### 1. Authentication Architecture #### How Tag Mode Works (Success Case) 1. User comments "@claude" on PR → `issue_comment` event 2. Action requests OIDC token with audience "claude-code-github-action" 3. Token exchange at `api.anthropic.com/api/github/github-app-token-exchange` 4. Backend validates event type is in allowed list 5. Returns Claude App token → posts as claude[bot] #### Why Workflow_Run Failed 1. Auto-fix workflow triggers on `workflow_run` event 2. OIDC token has `event_name: "workflow_run"` claim 3. Backend's `allowed_events` list didn't include "workflow_run" 4. Token exchange fails with "401 Unauthorized - Invalid OIDC token" 5. Can't get Claude App token → falls back to github-actions[bot] ### 2. OIDC Token Claims GitHub Actions OIDC tokens include: - `event_name`: The triggering event (pull_request, issue_comment, workflow_run, etc.) - `repository`: The repo where action runs - `actor`: Who triggered the action - `job_workflow_ref`: Reference to the workflow file - And many other claims for verification ### 3. Backend Validation **File**: `anthropic/api/api/private_api/routes/github/github_app_token_exchange.py` The backend validates: ```python allowed_events = [ "pull_request", "issue_comment", "pull_request_comment", "issues", "pull_request_review", "pull_request_review_comment", "repository_dispatch", "workflow_dispatch", "schedule", # "workflow_run" was missing! ] ``` ### 4. Agent Mode vs Tag Mode - **Tag Mode**: Triggers on PR/issue events, creates tracking comments - **Agent Mode**: Triggers on automation events (workflow_dispatch, schedule, and now workflow_run) - Both modes can use Claude App token if event is in allowed list ## Solution Implemented ### Backend Change (PR Created) Add `"workflow_run"` to the `allowed_events` list in the Claude backend to enable OIDC token exchange for workflow_run events. ### Why This Works - No special handling needed for different event types - Backend treats all allowed events the same way - Just validates token, checks permissions, returns Claude App token - Event name only used for validation and logging/metrics ## Current Status ### Completed - ✅ Created auto-fix workflow and slash command - ✅ Modified Claude Code Action to expose GitHub token as output - ✅ Identified root cause of authentication failure - ✅ Created PR to add workflow_run to backend allowed events ### Waiting On - ⏳ Backend PR approval and deployment - ⏳ Testing with updated backend ## Next Steps Once the backend PR is merged and deployed: ### 1. Test Auto-Fix Workflow - Create a test PR with intentional CI failures - Verify auto-fix workflow triggers - Confirm Claude can authenticate via OIDC - Verify comments come from claude[bot] ### 2. Potential Improvements - Add more sophisticated CI failure detection - Handle different types of failures (tests, linting, types, build) - Add progress indicators in PR comments - Consider batching multiple fixes - Add retry logic for transient failures ### 3. Documentation - Document the auto-fix workflow setup - Create examples for different CI systems - Add troubleshooting guide ### 4. Extended Features - Support for multiple CI workflows - Customizable fix strategies per project - Integration with other GitHub Actions events - Support for monorepo structures ## Alternative Approaches (If Backend Change Blocked) ### Option 1: Repository Dispatch Instead of `workflow_run`, use `repository_dispatch`: - Original workflow triggers dispatch event on failure - Auto-fix workflow responds to dispatch event - Works today without backend changes ### Option 2: Direct PR Event Trigger on `pull_request` with conditional logic: - Check CI status in the workflow - Only run if CI failed - Keeps PR context for OIDC exchange ### Option 3: Custom GitHub App Create separate GitHub App for auto-fix: - Has its own authentication - Posts as custom bot (not claude[bot]) - More complex but fully independent ## Key Learnings 1. **OIDC Context Matters**: The event context in OIDC tokens determines authentication success 2. **Backend Validation is Simple**: Just a list check, no complex event-specific logic 3. **Agent Mode is Powerful**: Designed for automation, just needed backend support 4. **Token Flow is Critical**: Understanding the full auth flow helped identify the issue 5. **Incremental Solutions Work**: Start simple, identify blockers, fix systematically ## Resources - [GitHub Actions OIDC Documentation](https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/about-security-hardening-with-openid-connect) - [Claude Code Action Repository](https://github.com/anthropics/claude-code-action) - [Backend PR for workflow_run support](#) (Add link when available) --- *Last Updated: 2025-08-20* *Session Duration: ~6 hours* *Key Achievement: Identified and resolved Claude App authentication for workflow_run events*