Files
claude-code-action/AUTO_FIX_CHECKPOINT.md

6.0 KiB

Auto-Fix CI Workflow Implementation Checkpoint

Overview

This document captures the learnings from implementing auto-fix CI workflows that allow Claude to automatically fix CI failures and post as claude[bot].

Journey Summary

Initial Goal

Create an auto-fix CI workflow similar to Cursor's implementation that:

  1. Detects CI failures on PRs
  2. Automatically triggers Claude to fix the issues
  3. Creates branches with fixes
  4. Posts PR comments as claude[bot] (not github-actions[bot])

Key Implementation Files

1. Auto-Fix Workflow

File: .github/workflows/auto-fix-ci-inline.yml

  • Triggers on workflow_run event when CI fails
  • Creates fix branch
  • Collects failure logs
  • Calls Claude Code Action with /fix-ci slash command
  • Posts PR comment with fix branch link

2. Fix-CI Slash Command

File: .claude/commands/fix-ci.md

  • Contains all instructions for analyzing and fixing CI failures
  • Handles test failures, type errors, linting issues
  • Commits and pushes fixes

3. Claude Code Action Changes (v1-dev branch)

Modified Files:

  • src/entrypoints/prepare.ts - Exposes GitHub token as output
  • action.yml - Adds github_token output definition

Critical Discoveries

1. Authentication Architecture

How Tag Mode Works (Success Case)

  1. User comments "@claude" on PR → issue_comment event
  2. Action requests OIDC token with audience "claude-code-github-action"
  3. Token exchange at api.anthropic.com/api/github/github-app-token-exchange
  4. Backend validates event type is in allowed list
  5. Returns Claude App token → posts as claude[bot]

Why Workflow_Run Failed

  1. Auto-fix workflow triggers on workflow_run event
  2. OIDC token has event_name: "workflow_run" claim
  3. Backend's allowed_events list didn't include "workflow_run"
  4. Token exchange fails with "401 Unauthorized - Invalid OIDC token"
  5. Can't get Claude App token → falls back to github-actions[bot]

2. OIDC Token Claims

GitHub Actions OIDC tokens include:

  • event_name: The triggering event (pull_request, issue_comment, workflow_run, etc.)
  • repository: The repo where action runs
  • actor: Who triggered the action
  • job_workflow_ref: Reference to the workflow file
  • And many other claims for verification

3. Backend Validation

File: anthropic/api/api/private_api/routes/github/github_app_token_exchange.py

The backend validates:

allowed_events = [
    "pull_request",
    "issue_comment", 
    "pull_request_comment",
    "issues",
    "pull_request_review",
    "pull_request_review_comment",
    "repository_dispatch",
    "workflow_dispatch",
    "schedule",
    # "workflow_run" was missing!
]

4. Agent Mode vs Tag Mode

  • Tag Mode: Triggers on PR/issue events, creates tracking comments
  • Agent Mode: Triggers on automation events (workflow_dispatch, schedule, and now workflow_run)
  • Both modes can use Claude App token if event is in allowed list

Solution Implemented

Backend Change (PR Created)

Add "workflow_run" to the allowed_events list in the Claude backend to enable OIDC token exchange for workflow_run events.

Why This Works

  • No special handling needed for different event types
  • Backend treats all allowed events the same way
  • Just validates token, checks permissions, returns Claude App token
  • Event name only used for validation and logging/metrics

Current Status

Completed

  • Created auto-fix workflow and slash command
  • Modified Claude Code Action to expose GitHub token as output
  • Identified root cause of authentication failure
  • Created PR to add workflow_run to backend allowed events

Waiting On

  • Backend PR approval and deployment
  • Testing with updated backend

Next Steps

Once the backend PR is merged and deployed:

1. Test Auto-Fix Workflow

  • Create a test PR with intentional CI failures
  • Verify auto-fix workflow triggers
  • Confirm Claude can authenticate via OIDC
  • Verify comments come from claude[bot]

2. Potential Improvements

  • Add more sophisticated CI failure detection
  • Handle different types of failures (tests, linting, types, build)
  • Add progress indicators in PR comments
  • Consider batching multiple fixes
  • Add retry logic for transient failures

3. Documentation

  • Document the auto-fix workflow setup
  • Create examples for different CI systems
  • Add troubleshooting guide

4. Extended Features

  • Support for multiple CI workflows
  • Customizable fix strategies per project
  • Integration with other GitHub Actions events
  • Support for monorepo structures

Alternative Approaches (If Backend Change Blocked)

Option 1: Repository Dispatch

Instead of workflow_run, use repository_dispatch:

  • Original workflow triggers dispatch event on failure
  • Auto-fix workflow responds to dispatch event
  • Works today without backend changes

Option 2: Direct PR Event

Trigger on pull_request with conditional logic:

  • Check CI status in the workflow
  • Only run if CI failed
  • Keeps PR context for OIDC exchange

Option 3: Custom GitHub App

Create separate GitHub App for auto-fix:

  • Has its own authentication
  • Posts as custom bot (not claude[bot])
  • More complex but fully independent

Key Learnings

  1. OIDC Context Matters: The event context in OIDC tokens determines authentication success
  2. Backend Validation is Simple: Just a list check, no complex event-specific logic
  3. Agent Mode is Powerful: Designed for automation, just needed backend support
  4. Token Flow is Critical: Understanding the full auth flow helped identify the issue
  5. Incremental Solutions Work: Start simple, identify blockers, fix systematically

Resources


Last Updated: 2025-08-20 Session Duration: ~6 hours Key Achievement: Identified and resolved Claude App authentication for workflow_run events