Walkthrough

Cancellation and Retries

Killing runaway agents and retrying transient failures.

Long agent runs go wrong in two ways: they hang on a tool that never returns, or they wander into a loop where each turn re-asks the same question. Both need timeout-and-cancel. Network blips and rate limits, by contrast, deserve retry-with-backoff, not cancellation.

Steps · 0 / 3 done

Per-step timeouts

Set a max wall-clock per step. Above the limit, Antigravity cancels and routes the failure to the coordinator.

// pipeline.json
{
  "steps": [
    { "role": "planner", "timeout_s": 300 },
    { "role": "builder",  "timeout_s": 900 },
    { "role": "reviewer", "timeout_s": 300 },
    { "role": "deployer", "timeout_s": 600 }
  ]
}

VerifyA step exceeding its budget shows as Cancelled, not stuck.

Retry policy
Distinguish transient (timeouts, 429, 5xx from external APIs) from permanent (test failure, deny by gate). Retry the first; never retry the second.
```
{
  "retries": {
    "transient": { "max": 3, "backoff_ms": [1000, 5000, 20000] },
    "permanent": { "max": 0 }
  }
}
```
VerifyA 429 from a downstream API retries; a failing test does not.
Manual cancel
You can always kill a run from the CLI.
```
antigravity run cancel --id run_abc123
```
VerifyRun state flips to Cancelled within seconds; in-flight tools are killed.

Check your understanding

Q1. Why never retry a permanent failure?

· Tick off the 3 step(s) above.

· Score 100% on the quiz.

Cancellation and Retries

Per-step timeouts

Retry policy

Manual cancel