6.0 KiB
Auto-Fix CI Workflow Implementation Checkpoint
Overview
This document captures the learnings from implementing auto-fix CI workflows that allow Claude to automatically fix CI failures and post as claude[bot].
Journey Summary
Initial Goal
Create an auto-fix CI workflow similar to Cursor's implementation that:
- Detects CI failures on PRs
- Automatically triggers Claude to fix the issues
- Creates branches with fixes
- Posts PR comments as claude[bot] (not github-actions[bot])
Key Implementation Files
1. Auto-Fix Workflow
File: .github/workflows/auto-fix-ci-inline.yml
- Triggers on
workflow_runevent when CI fails - Creates fix branch
- Collects failure logs
- Calls Claude Code Action with
/fix-cislash command - Posts PR comment with fix branch link
2. Fix-CI Slash Command
File: .claude/commands/fix-ci.md
- Contains all instructions for analyzing and fixing CI failures
- Handles test failures, type errors, linting issues
- Commits and pushes fixes
3. Claude Code Action Changes (v1-dev branch)
Modified Files:
src/entrypoints/prepare.ts- Exposes GitHub token as outputaction.yml- Adds github_token output definition
Critical Discoveries
1. Authentication Architecture
How Tag Mode Works (Success Case)
- User comments "@claude" on PR →
issue_commentevent - Action requests OIDC token with audience "claude-code-github-action"
- Token exchange at
api.anthropic.com/api/github/github-app-token-exchange - Backend validates event type is in allowed list
- Returns Claude App token → posts as claude[bot]
Why Workflow_Run Failed
- Auto-fix workflow triggers on
workflow_runevent - OIDC token has
event_name: "workflow_run"claim - Backend's
allowed_eventslist didn't include "workflow_run" - Token exchange fails with "401 Unauthorized - Invalid OIDC token"
- Can't get Claude App token → falls back to github-actions[bot]
2. OIDC Token Claims
GitHub Actions OIDC tokens include:
event_name: The triggering event (pull_request, issue_comment, workflow_run, etc.)repository: The repo where action runsactor: Who triggered the actionjob_workflow_ref: Reference to the workflow file- And many other claims for verification
3. Backend Validation
File: anthropic/api/api/private_api/routes/github/github_app_token_exchange.py
The backend validates:
allowed_events = [
"pull_request",
"issue_comment",
"pull_request_comment",
"issues",
"pull_request_review",
"pull_request_review_comment",
"repository_dispatch",
"workflow_dispatch",
"schedule",
# "workflow_run" was missing!
]
4. Agent Mode vs Tag Mode
- Tag Mode: Triggers on PR/issue events, creates tracking comments
- Agent Mode: Triggers on automation events (workflow_dispatch, schedule, and now workflow_run)
- Both modes can use Claude App token if event is in allowed list
Solution Implemented
Backend Change (PR Created)
Add "workflow_run" to the allowed_events list in the Claude backend to enable OIDC token exchange for workflow_run events.
Why This Works
- No special handling needed for different event types
- Backend treats all allowed events the same way
- Just validates token, checks permissions, returns Claude App token
- Event name only used for validation and logging/metrics
Current Status
Completed
- ✅ Created auto-fix workflow and slash command
- ✅ Modified Claude Code Action to expose GitHub token as output
- ✅ Identified root cause of authentication failure
- ✅ Created PR to add workflow_run to backend allowed events
Waiting On
- ⏳ Backend PR approval and deployment
- ⏳ Testing with updated backend
Next Steps
Once the backend PR is merged and deployed:
1. Test Auto-Fix Workflow
- Create a test PR with intentional CI failures
- Verify auto-fix workflow triggers
- Confirm Claude can authenticate via OIDC
- Verify comments come from claude[bot]
2. Potential Improvements
- Add more sophisticated CI failure detection
- Handle different types of failures (tests, linting, types, build)
- Add progress indicators in PR comments
- Consider batching multiple fixes
- Add retry logic for transient failures
3. Documentation
- Document the auto-fix workflow setup
- Create examples for different CI systems
- Add troubleshooting guide
4. Extended Features
- Support for multiple CI workflows
- Customizable fix strategies per project
- Integration with other GitHub Actions events
- Support for monorepo structures
Alternative Approaches (If Backend Change Blocked)
Option 1: Repository Dispatch
Instead of workflow_run, use repository_dispatch:
- Original workflow triggers dispatch event on failure
- Auto-fix workflow responds to dispatch event
- Works today without backend changes
Option 2: Direct PR Event
Trigger on pull_request with conditional logic:
- Check CI status in the workflow
- Only run if CI failed
- Keeps PR context for OIDC exchange
Option 3: Custom GitHub App
Create separate GitHub App for auto-fix:
- Has its own authentication
- Posts as custom bot (not claude[bot])
- More complex but fully independent
Key Learnings
- OIDC Context Matters: The event context in OIDC tokens determines authentication success
- Backend Validation is Simple: Just a list check, no complex event-specific logic
- Agent Mode is Powerful: Designed for automation, just needed backend support
- Token Flow is Critical: Understanding the full auth flow helped identify the issue
- Incremental Solutions Work: Start simple, identify blockers, fix systematically
Resources
- GitHub Actions OIDC Documentation
- Claude Code Action Repository
- Backend PR for workflow_run support (Add link when available)
Last Updated: 2025-08-20 Session Duration: ~6 hours Key Achievement: Identified and resolved Claude App authentication for workflow_run events