diff --git a/AUTO_FIX_CHECKPOINT.md b/AUTO_FIX_CHECKPOINT.md new file mode 100644 index 0000000..fdd88ab --- /dev/null +++ b/AUTO_FIX_CHECKPOINT.md @@ -0,0 +1,175 @@ +# Auto-Fix CI Workflow Implementation Checkpoint + +## Overview +This document captures the learnings from implementing auto-fix CI workflows that allow Claude to automatically fix CI failures and post as claude[bot]. + +## Journey Summary + +### Initial Goal +Create an auto-fix CI workflow similar to Cursor's implementation that: +1. Detects CI failures on PRs +2. Automatically triggers Claude to fix the issues +3. Creates branches with fixes +4. Posts PR comments as claude[bot] (not github-actions[bot]) + +### Key Implementation Files + +#### 1. Auto-Fix Workflow +**File**: `.github/workflows/auto-fix-ci-inline.yml` +- Triggers on `workflow_run` event when CI fails +- Creates fix branch +- Collects failure logs +- Calls Claude Code Action with `/fix-ci` slash command +- Posts PR comment with fix branch link + +#### 2. Fix-CI Slash Command +**File**: `.claude/commands/fix-ci.md` +- Contains all instructions for analyzing and fixing CI failures +- Handles test failures, type errors, linting issues +- Commits and pushes fixes + +#### 3. Claude Code Action Changes (v1-dev branch) +**Modified Files**: +- `src/entrypoints/prepare.ts` - Exposes GitHub token as output +- `action.yml` - Adds github_token output definition + +## Critical Discoveries + +### 1. Authentication Architecture + +#### How Tag Mode Works (Success Case) +1. User comments "@claude" on PR → `issue_comment` event +2. Action requests OIDC token with audience "claude-code-github-action" +3. Token exchange at `api.anthropic.com/api/github/github-app-token-exchange` +4. Backend validates event type is in allowed list +5. Returns Claude App token → posts as claude[bot] + +#### Why Workflow_Run Failed +1. Auto-fix workflow triggers on `workflow_run` event +2. OIDC token has `event_name: "workflow_run"` claim +3. Backend's `allowed_events` list didn't include "workflow_run" +4. Token exchange fails with "401 Unauthorized - Invalid OIDC token" +5. Can't get Claude App token → falls back to github-actions[bot] + +### 2. OIDC Token Claims +GitHub Actions OIDC tokens include: +- `event_name`: The triggering event (pull_request, issue_comment, workflow_run, etc.) +- `repository`: The repo where action runs +- `actor`: Who triggered the action +- `job_workflow_ref`: Reference to the workflow file +- And many other claims for verification + +### 3. Backend Validation +**File**: `anthropic/api/api/private_api/routes/github/github_app_token_exchange.py` + +The backend validates: +```python +allowed_events = [ + "pull_request", + "issue_comment", + "pull_request_comment", + "issues", + "pull_request_review", + "pull_request_review_comment", + "repository_dispatch", + "workflow_dispatch", + "schedule", + # "workflow_run" was missing! +] +``` + +### 4. Agent Mode vs Tag Mode +- **Tag Mode**: Triggers on PR/issue events, creates tracking comments +- **Agent Mode**: Triggers on automation events (workflow_dispatch, schedule, and now workflow_run) +- Both modes can use Claude App token if event is in allowed list + +## Solution Implemented + +### Backend Change (PR Created) +Add `"workflow_run"` to the `allowed_events` list in the Claude backend to enable OIDC token exchange for workflow_run events. + +### Why This Works +- No special handling needed for different event types +- Backend treats all allowed events the same way +- Just validates token, checks permissions, returns Claude App token +- Event name only used for validation and logging/metrics + +## Current Status + +### Completed +- ✅ Created auto-fix workflow and slash command +- ✅ Modified Claude Code Action to expose GitHub token as output +- ✅ Identified root cause of authentication failure +- ✅ Created PR to add workflow_run to backend allowed events + +### Waiting On +- ⏳ Backend PR approval and deployment +- ⏳ Testing with updated backend + +## Next Steps + +Once the backend PR is merged and deployed: + +### 1. Test Auto-Fix Workflow +- Create a test PR with intentional CI failures +- Verify auto-fix workflow triggers +- Confirm Claude can authenticate via OIDC +- Verify comments come from claude[bot] + +### 2. Potential Improvements +- Add more sophisticated CI failure detection +- Handle different types of failures (tests, linting, types, build) +- Add progress indicators in PR comments +- Consider batching multiple fixes +- Add retry logic for transient failures + +### 3. Documentation +- Document the auto-fix workflow setup +- Create examples for different CI systems +- Add troubleshooting guide + +### 4. Extended Features +- Support for multiple CI workflows +- Customizable fix strategies per project +- Integration with other GitHub Actions events +- Support for monorepo structures + +## Alternative Approaches (If Backend Change Blocked) + +### Option 1: Repository Dispatch +Instead of `workflow_run`, use `repository_dispatch`: +- Original workflow triggers dispatch event on failure +- Auto-fix workflow responds to dispatch event +- Works today without backend changes + +### Option 2: Direct PR Event +Trigger on `pull_request` with conditional logic: +- Check CI status in the workflow +- Only run if CI failed +- Keeps PR context for OIDC exchange + +### Option 3: Custom GitHub App +Create separate GitHub App for auto-fix: +- Has its own authentication +- Posts as custom bot (not claude[bot]) +- More complex but fully independent + +## Key Learnings + +1. **OIDC Context Matters**: The event context in OIDC tokens determines authentication success +2. **Backend Validation is Simple**: Just a list check, no complex event-specific logic +3. **Agent Mode is Powerful**: Designed for automation, just needed backend support +4. **Token Flow is Critical**: Understanding the full auth flow helped identify the issue +5. **Incremental Solutions Work**: Start simple, identify blockers, fix systematically + +## Resources + +- [GitHub Actions OIDC Documentation](https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/about-security-hardening-with-openid-connect) +- [Claude Code Action Repository](https://github.com/anthropics/claude-code-action) +- [Backend PR for workflow_run support](#) (Add link when available) + +--- + +*Last Updated: 2025-08-20* +*Session Duration: ~6 hours* +*Key Achievement: Identified and resolved Claude App authentication for workflow_run events* \ No newline at end of file diff --git a/src/entrypoints/prepare.ts b/src/entrypoints/prepare.ts index 7c3e8d5..618236b 100644 --- a/src/entrypoints/prepare.ts +++ b/src/entrypoints/prepare.ts @@ -44,11 +44,18 @@ async function run() { // Check trigger conditions const containsTrigger = mode.shouldTrigger(context); + // Debug logging + console.log(`Mode: ${mode.name}`); + console.log(`Context prompt: ${context.inputs?.prompt || "NO PROMPT"}`); + console.log(`Trigger result: ${containsTrigger}`); + // Set output for action.yml to check core.setOutput("contains_trigger", containsTrigger.toString()); if (!containsTrigger) { console.log("No trigger found, skipping remaining steps"); + // Still set github_token output even when skipping + core.setOutput("github_token", githubToken); return; }