Walkthrough

Cancellation and Retries

Killing runaway agents and retrying transient failures.

Long agent runs go wrong in two ways: they hang on a tool that never returns, or they wander into a loop where each turn re-asks the same question. Both need timeout-and-cancel. Network blips and rate limits, by contrast, deserve retry-with-backoff, not cancellation.

Steps · 0 / 3 done
  1. Per-step timeouts

    Set a max wall-clock per step. Above the limit, Antigravity cancels and routes the failure to the coordinator.

    // pipeline.json
    {
      "steps": [
        { "role": "planner", "timeout_s": 300 },
        { "role": "builder",  "timeout_s": 900 },
        { "role": "reviewer", "timeout_s": 300 },
        { "role": "deployer", "timeout_s": 600 }
      ]
    }
    VerifyA step exceeding its budget shows as Cancelled, not stuck.
  2. Retry policy

    Distinguish transient (timeouts, 429, 5xx from external APIs) from permanent (test failure, deny by gate). Retry the first; never retry the second.

    {
      "retries": {
        "transient": { "max": 3, "backoff_ms": [1000, 5000, 20000] },
        "permanent": { "max": 0 }
      }
    }
    VerifyA 429 from a downstream API retries; a failing test does not.
  3. Manual cancel

    You can always kill a run from the CLI.

    antigravity run cancel --id run_abc123
    VerifyRun state flips to Cancelled within seconds; in-flight tools are killed.
Check your understanding
Q1. Why never retry a permanent failure?
· Tick off the 3 step(s) above.
· Score 100% on the quiz.