diff --git a/AUTO_FIX_CHECKPOINT.md b/AUTO_FIX_CHECKPOINT.md deleted file mode 100644 index fdd88ab..0000000 --- a/AUTO_FIX_CHECKPOINT.md +++ /dev/null @@ -1,175 +0,0 @@ -# Auto-Fix CI Workflow Implementation Checkpoint - -## Overview -This document captures the learnings from implementing auto-fix CI workflows that allow Claude to automatically fix CI failures and post as claude[bot]. - -## Journey Summary - -### Initial Goal -Create an auto-fix CI workflow similar to Cursor's implementation that: -1. Detects CI failures on PRs -2. Automatically triggers Claude to fix the issues -3. Creates branches with fixes -4. Posts PR comments as claude[bot] (not github-actions[bot]) - -### Key Implementation Files - -#### 1. Auto-Fix Workflow -**File**: `.github/workflows/auto-fix-ci-inline.yml` -- Triggers on `workflow_run` event when CI fails -- Creates fix branch -- Collects failure logs -- Calls Claude Code Action with `/fix-ci` slash command -- Posts PR comment with fix branch link - -#### 2. Fix-CI Slash Command -**File**: `.claude/commands/fix-ci.md` -- Contains all instructions for analyzing and fixing CI failures -- Handles test failures, type errors, linting issues -- Commits and pushes fixes - -#### 3. Claude Code Action Changes (v1-dev branch) -**Modified Files**: -- `src/entrypoints/prepare.ts` - Exposes GitHub token as output -- `action.yml` - Adds github_token output definition - -## Critical Discoveries - -### 1. Authentication Architecture - -#### How Tag Mode Works (Success Case) -1. User comments "@claude" on PR → `issue_comment` event -2. Action requests OIDC token with audience "claude-code-github-action" -3. Token exchange at `api.anthropic.com/api/github/github-app-token-exchange` -4. Backend validates event type is in allowed list -5. Returns Claude App token → posts as claude[bot] - -#### Why Workflow_Run Failed -1. Auto-fix workflow triggers on `workflow_run` event -2. OIDC token has `event_name: "workflow_run"` claim -3. Backend's `allowed_events` list didn't include "workflow_run" -4. Token exchange fails with "401 Unauthorized - Invalid OIDC token" -5. Can't get Claude App token → falls back to github-actions[bot] - -### 2. OIDC Token Claims -GitHub Actions OIDC tokens include: -- `event_name`: The triggering event (pull_request, issue_comment, workflow_run, etc.) -- `repository`: The repo where action runs -- `actor`: Who triggered the action -- `job_workflow_ref`: Reference to the workflow file -- And many other claims for verification - -### 3. Backend Validation -**File**: `anthropic/api/api/private_api/routes/github/github_app_token_exchange.py` - -The backend validates: -```python -allowed_events = [ - "pull_request", - "issue_comment", - "pull_request_comment", - "issues", - "pull_request_review", - "pull_request_review_comment", - "repository_dispatch", - "workflow_dispatch", - "schedule", - # "workflow_run" was missing! -] -``` - -### 4. Agent Mode vs Tag Mode -- **Tag Mode**: Triggers on PR/issue events, creates tracking comments -- **Agent Mode**: Triggers on automation events (workflow_dispatch, schedule, and now workflow_run) -- Both modes can use Claude App token if event is in allowed list - -## Solution Implemented - -### Backend Change (PR Created) -Add `"workflow_run"` to the `allowed_events` list in the Claude backend to enable OIDC token exchange for workflow_run events. - -### Why This Works -- No special handling needed for different event types -- Backend treats all allowed events the same way -- Just validates token, checks permissions, returns Claude App token -- Event name only used for validation and logging/metrics - -## Current Status - -### Completed -- ✅ Created auto-fix workflow and slash command -- ✅ Modified Claude Code Action to expose GitHub token as output -- ✅ Identified root cause of authentication failure -- ✅ Created PR to add workflow_run to backend allowed events - -### Waiting On -- ⏳ Backend PR approval and deployment -- ⏳ Testing with updated backend - -## Next Steps - -Once the backend PR is merged and deployed: - -### 1. Test Auto-Fix Workflow -- Create a test PR with intentional CI failures -- Verify auto-fix workflow triggers -- Confirm Claude can authenticate via OIDC -- Verify comments come from claude[bot] - -### 2. Potential Improvements -- Add more sophisticated CI failure detection -- Handle different types of failures (tests, linting, types, build) -- Add progress indicators in PR comments -- Consider batching multiple fixes -- Add retry logic for transient failures - -### 3. Documentation -- Document the auto-fix workflow setup -- Create examples for different CI systems -- Add troubleshooting guide - -### 4. Extended Features -- Support for multiple CI workflows -- Customizable fix strategies per project -- Integration with other GitHub Actions events -- Support for monorepo structures - -## Alternative Approaches (If Backend Change Blocked) - -### Option 1: Repository Dispatch -Instead of `workflow_run`, use `repository_dispatch`: -- Original workflow triggers dispatch event on failure -- Auto-fix workflow responds to dispatch event -- Works today without backend changes - -### Option 2: Direct PR Event -Trigger on `pull_request` with conditional logic: -- Check CI status in the workflow -- Only run if CI failed -- Keeps PR context for OIDC exchange - -### Option 3: Custom GitHub App -Create separate GitHub App for auto-fix: -- Has its own authentication -- Posts as custom bot (not claude[bot]) -- More complex but fully independent - -## Key Learnings - -1. **OIDC Context Matters**: The event context in OIDC tokens determines authentication success -2. **Backend Validation is Simple**: Just a list check, no complex event-specific logic -3. **Agent Mode is Powerful**: Designed for automation, just needed backend support -4. **Token Flow is Critical**: Understanding the full auth flow helped identify the issue -5. **Incremental Solutions Work**: Start simple, identify blockers, fix systematically - -## Resources - -- [GitHub Actions OIDC Documentation](https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/about-security-hardening-with-openid-connect) -- [Claude Code Action Repository](https://github.com/anthropics/claude-code-action) -- [Backend PR for workflow_run support](#) (Add link when available) - ---- - -*Last Updated: 2025-08-20* -*Session Duration: ~6 hours* -*Key Achievement: Identified and resolved Claude App authentication for workflow_run events* \ No newline at end of file