benchmarks

AgentLoop Performance Benchmarks

Benchmark suite for measuring and tracking the performance of critical paths in AgentLoop.

Running

# Run all benchmarks
npm run bench

# Run with Node.js CPU profiling (generates isolate-*.log)
npm run bench:profile

# Process the .log file into a human-readable flame graph
node --prof-process isolate-*.log > profile.txt

Profiling on Windows: node --prof writes an isolate-<pid>-<seq>-v8.log file in the current working directory. Process it with node --prof-process. For a visual flame graph, install clinic: npx clinic flame -- node --require tsx/cjs benchmarks/run-all.ts

Acceptance Criteria

Benchmark	Metric	Threshold
Tool Registry Lookup	Avg per-lookup time	< 1 ms
Context Trimming	Total time for 1000 messages	< 100 ms

The runner exits with code 1 if any benchmark violates its threshold.

Baseline & Optimized Results

Results measured on a mid-range developer laptop (Intel Core i7-1185G7, 32 GB RAM, NVMe SSD, Node.js 20.x, Windows 11). Your numbers will vary.

Tool Registry Lookup

Scenario	Duration	Avg/lookup	Ops/sec	Status
Baseline (Map.get, 10k iters)	3.34 ms	0.000334 ms	~2,990,967	PASS ✓

Notes: ToolRegistry uses a Map<string, RegistryEntry> internally. Map.get() is O(1) and sub-microsecond for 100 entries. No optimization needed; the implementation is already optimal.

Context Trimming (1000 messages)

Before optimization

trimMessages first called countTokens(messages) which encoded every message once, then in the drop loop called messageTokens(middle[i]) again for each dropped message — up to 2 × N tokenizer calls in the worst case.

Scenario	Duration	Status
1000 messages, budget=500 tok (pre-opt)	~120–180 ms	FAIL (> 100 ms)

After optimization (`src/context.ts`)

Token counts are now computed in one single pass (messages.map(messageTokens)) and the pre-computed array is reused in the drop loop, cutting tokenizer calls roughly in half.

// Before (redundant encode calls in the while loop):
while (i < middle.length && total > maxTokens) {
  total -= messageTokens(middle[i]);   // enc.encode() called again here
  i++;
}

// After (pre-computed counts reused):
const tokenCounts = messages.map(messageTokens);   // single pass
const middleCounts = tokenCounts.slice(1, -1);
while (i < middle.length && total > maxTokens) {
  total -= middleCounts[i];            // no extra encode call
  i++;
}

Scenario	Duration	Status
1000 messages, budget=500 tok (post-opt)	88.74 ms	PASS ✓ (< 100 ms)

Code Search (500 files)

Uses the built-in fs-based fallback (ripgrep is preferred in production but not required for the benchmark). Performance is I/O-bound.

Scenario	Duration	Hits	Status
Literal search, 500 TS files (warm cache)	193.13 ms	500	—

No hard threshold for this benchmark; it documents search time at scale.
Optimization recommendation: The searchWithFs function reads files sequentially. Parallelising reads with a bounded Promise.all pool (e.g. 8 concurrent) could reduce wall-clock time by 4–8× on multi-core machines with fast storage.

Workspace Analysis (large Node.js project)

Reads package.json, lock files, .git/HEAD, and checks for the existence of several well-known files.

Scenario	Duration (20 iters)	Avg/call	Ops/sec
Large Node.js project (warm FS cache)	15.82 ms	0.8 ms	~1,264

Notes: Each analyzeWorkspace call makes 5–8 fs.access / fs.readFile calls. No significant optimization headroom beyond OS-level file caching.

CI Integration

If a GitHub Actions workflow exists in .github/workflows/, add a benchmark step:

- name: Run performance benchmarks
  run: npm run bench

This catches regressions automatically on every PR. If no CI workflow currently exists, run npm run bench manually before merging performance-sensitive changes and record the output herein.

Current CI status: No GitHub Actions workflow file was present at the time benchmarks were added. Add a workflow (e.g. .github/workflows/ci.yml) and include the step above to enable automated regression detection.

Adding a New Benchmark

Create benchmarks/<name>.bench.ts exporting:

export interface BenchmarkResult {
  name: string; durationMs: number; iterations: number; opsPerSec: number;
}
export async function run(): Promise<BenchmarkResult> { ... }

Add an entry to the BENCHMARKS array in run-all.ts.
Document baseline numbers in this file.

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
code-search.bench.ts		code-search.bench.ts
context-trim.bench.ts		context-trim.bench.ts
run-all.ts		run-all.ts
tool-registry.bench.ts		tool-registry.bench.ts
workspace-analysis.bench.ts		workspace-analysis.bench.ts

File	What it measures
`tool-registry.bench.ts`	Map-based tool lookup across 100 registered tools (10,000 iterations)
`context-trim.bench.ts`	Token counting + message trimming for a 1000-message history
`code-search.bench.ts`	File-system code search across a 500-file simulated repository
`workspace-analysis.bench.ts`	Workspace language/framework detection on a large Node.js project
`run-all.ts`	Runner: executes all benchmarks and prints a summary table

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

AgentLoop Performance Benchmarks

Directory

Running

Acceptance Criteria

Baseline & Optimized Results

Tool Registry Lookup

Context Trimming (1000 messages)

Before optimization

After optimization (`src/context.ts`)

Code Search (500 files)

Workspace Analysis (large Node.js project)

CI Integration

Adding a New Benchmark

FilesExpand file tree

benchmarks

Directory actions

More options

Directory actions

More options

Latest commit

History

benchmarks

Folders and files

parent directory

README.md

AgentLoop Performance Benchmarks

Directory

Running

Acceptance Criteria

Baseline & Optimized Results

Tool Registry Lookup

Context Trimming (1000 messages)

Before optimization

After optimization (src/context.ts)

Code Search (500 files)

Workspace Analysis (large Node.js project)

CI Integration

Adding a New Benchmark

After optimization (`src/context.ts`)