Late: High-Leverage AI Agent Orchestration

Every other coding agent floods its own context with edits, retries and implementation details until the model loses the thread. Late delegates all of that to ephemeral subagents — isolated contexts that execute one task and are destroyed. The orchestrator sees only plans and outcomes, never the mess. Single static binary, zero dependencies, any model.

Drop into any project and start building. Get to your first prompt in less than 10 seconds.

brew tap mlhher/late && brew install late
cd your-project
late

Not using Homebrew?

Arch Linux: yay -S late-cli-bin

Linux / macOS / Windows: Download the latest binary and drop it in your PATH. (macOS manual download: if blocked, run xattr -d com.apple.quarantine /path/to/late)

Connecting to Cloud Models? Local models (llama.cpp on :8080, the default for llama-server) work out-of-the-box. No configuration required. For cloud providers (DeepSeek, Claude, Gemini, OpenRouter), set your OPENAI_BASE_URL, OPENAI_API_KEY, and OPENAI_MODEL environment variables.

Lead Architect forming a plan and spawning an atomic subagent for a surgical edit.

	Late	Claude Code	OpenCode	The Weekly Clone
Workflow	Autonomous Orchestration	Manual toggling	Manual toggling	Blind execution/Manual toggling
Implementations	Ephemeral subagents (Context destroyed)	Floods main context window	Floods main context window	Floods main context window
KV-Cache	Ruthless KV cache management	Brute-force context dumping	Brute-force context dumping	Brute-force context dumping
System Prompt	~1,000 tokens (Always planning workflow)	10,000+ tokens	10,000+ tokens	~300-1000+ tokens (No-workflow lobotomy)
Dependencies	Zero-dependency static binary	Node.js	Node.js	Python / Node.js
Setup required	None (OOTB `llama-server` support)	Anthropic OAuth / Sign-in	Mandatory JSON tweaking	Flavor of the week JSON/YAML/TOML configs
Built For	Builders wanting 10x throughput	Enterprise expense accounts	Tinkering with settings	Chasing GitHub stars

"The same model feels smarter with Late." — Reddit

"Late-CLI is mindblowing... I'm shocked that the token usage is so minimal, I keep expecting a big bill from DeepSeek's API." — GitHub Discussions

Outperforming Claude Code and Codex for Local LLM Workflows — Agent Native

Built with Late: Late is primarily developed inside Late itself.

Works with Claude, DeepSeek, Qwen, Gemma (including thinking support for Gemma), and any OpenAI-compatible API. See the Quickstart Guide for hybrid model routing, keybindings, MCP setup, Skills and more.

How It Works

Standard coding agents do all their work, whether it's planning, implementing, retrying failed edits, or self-healing, in one shared context window. Every retry, every failed implementation, every repair loop pollutes the context the model reasons from. It degrades. You blame the model. The model is fine.

Late separates concerns. A lean orchestrator (~1,000 token system prompt) reads your codebase, forms a plan, and delegates individual implementation tasks to ephemeral subagents. Each subagent gets a fresh isolated context containing only its one task and nothing else. When it completes, that context is destroyed. The orchestrator only ever sees outcomes.

Late manages the KV cache and context window carefully, leaving more room for reasoning. The orchestrator's context grows only from what matters: your instructions and the agent's decisions. Everything the subagent did to get there is gone with it. This is why the same model feels sharper in Late. It reasons from signal, not noise.

Features

Hybrid Model Routing: Architect the plan with a massive reasoning model (e.g., DeepSeek V4), then spawn subagents to execute it using blazing-fast, cheap local models (e.g., Gemma 4).
Exact-Match Diffs: Strict search/replace blocks with autonomous self-healing on mismatch. Edits fail loud. We never silently corrupt your files.
Human-in-the-Loop: Read-only commands are auto-approved for velocity. Mutations hard-stop for [y/N]. Features Session, Project, and Global trust scopes with TTL decay.
Stateful Resilience: The Orchestrator maintains continuous session history on disk. Close your terminal, reboot your machine, and pick up exactly where you left off.
MCP Integration: Natively map external Model Context Protocol servers directly into Late via standard I/O.
Agent Skills: Drop in reusable sets of instructions and scripts. Zero configuration or boilerplate required.
Git Worktree Support: Run independent, parallel agent instances across multiple branches without context bleeding.
Gemma 4 Thinking Mode: Standard wrappers just pipe text to an API, which means they can't trigger Gemma's reasoning. Late includes a dedicated flag to inject the exact tokens required to actually make it think.

License

Built to create engineering leverage, not to supply free infrastructure for AI startups.

Free for builders: Use Late freely to write code for any project, including commercial ones. Your output is yours.

Commercial restrictions: You may not monetize Late itself. Wrapping the orchestration engine into a paid service or deploying it as enterprise infrastructure requires a commercial agreement.

Late converts to GPLv2 on February 21, 2030. Full license in LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.github		.github
assets		assets
cmd		cmd
docs		docs
internal		internal
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README.zh-CN.md		README.zh-CN.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Late: High-Leverage AI Agent Orchestration

How It Works

Features

License

About

Uh oh!

Releases 15

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Late: High-Leverage AI Agent Orchestration

How It Works

Features

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 15

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages