What Six Months of Building AI Agents Taught Us About Tool Use

Use Cases

What Six Months of Building AI Agents Taught Us About Tool Use

Korelos has been in private early access for six months. In that time we’ve worked closely with about a dozen teams across customer support, internal operations automation, and research. The common thread across every successful deployment hasn’t been the model or the prompt. It’s been how the team thought about tools.

Lesson 1: Most teams ship with too many tools

The instinct, when you sit down to build an agent, is to expose every API your company has. “The agent can do anything.” In practice, the more tools you expose, the worse the agent’s planning gets. We saw one team with twenty-eight tools whose agent was getting confused about which to use, even on simple requests. They cut it to seven and accuracy jumped immediately.

Start small. Add tools when you can point at a specific failure that the missing tool would have prevented. Not before.

Lesson 2: Tool descriptions are prompt engineering

The descriptions you write for each tool are not documentation. They are part of the agent’s prompt, and the model is going to make routing decisions based on them. A tool described as “Looks up order status” will be reached for very differently than one described as “Retrieves the current shipping status, last carrier event, and delivery ETA for an existing order. Use this when the customer is asking where their package is.”

The teams that took this seriously got measurably better routing. The teams that didn’t kept blaming the model.

Lesson 3: Tool inputs should look like commands, not API requests

Models are good at producing natural-sounding intent. They are bad at producing the kind of nested, optional, type-strict JSON that REST APIs from 2018 expect. The teams that got the best results designed tool input schemas that read like how a human would describe the action: find_order(query) instead of orders.search({"filter":{"order_id":{"eq":...}}}).

If your real API needs the second shape, write a thin adapter. The agent never sees it.

Lesson 4: Outputs need a budget

Tools that return five-megabyte JSON blobs will eat your context window. Every tool needs to either return a small, summarized result, or return a handle that the agent can choose to expand. The default of “return everything the API returned” is the single most common reason agents fall over after a few turns.

The single biggest predictor of agent success isn’t the model. It’s whether the team treated tool design as a first-class problem.

What we’re doing about it

Most of what’s gone into the Korelos tool layer over the last six months has been about making these patterns the easy default. Tool schemas are validated against best practices when you create them. Output sizes are budgeted automatically. Tool descriptions are linted for the things we’ve seen go wrong in production. None of this is glamorous, and none of it is what an agent demo on day one cares about. All of it is the difference between an agent that works for a week and one that’s still working in six months.

We’re going to keep writing about what we learn. If you’re an early-access partner, expect this kind of post to be the norm. If you’re not yet, the waitlist is open.