All posts

5 min read

Token Based Turn Counts in Agents

Since agentic features can be expensive, the number of API calls must be intentionally managed. It is common to simply set a turn count and move on, but for complex agent workflows, static turn counts can meaningfully reduce the quality of the product. Token based turn counts could provide an alternative solution for high-turn-count agents that need stay within budget.

  • AI Agents
  • Turn Counts

Introduction

In terms of technical requirements, the project I was working on required three agents than ran one after the other. The first agent would index the pages of a PDF document and label page ranges. The second agent would use the indexed page ranges as context to extract several data points. The third agent would then run various business rules against the extracted information and re-inspect pages associated with inconsistent data points.

The proposed solution was to give the agents tools to complete their tasks along with a maximum number of tool calls they could use before they had to give a response or flag for human-in-the-loop review. The third agent in particular had a recommended turn count of 30 because of the types of analysis it was carrying out on dozens of data points.

If the turn counts are set too low, the agent struggles to properly complete its task and is more likely to prematurely return lower quality data or flag for review. If the turn counts are set too high, the feature can become too expensive or take too long to return.

I wanted to give the third agent a high turn count because of the number of items on it's to-do list, but I was worried about cost. If I gave the agent too high of a turn count, the feature could go over budget occassionally.

Dynamic Turn Counts

I wanted to find a way to make the turn count dynamic. Because of the tools and relative simplicity of the first two agents, I was ok with using static turn counts for those. If I could make sure I didn't exceed a certain API cost per run for the third agent, I could increase the turn count on demand.

I could store the model metadata and token pricing information to compute how much I'm spending per call. If I reach turn 30 and am out of turns, I could keep going as long as I stay under the set cost.

Because there is a risk that the model uses more tokens than expected on a given API call, I can keep the limit marginally below the threshold.

Conclusion

This is not always the solution. Because of the unique parameters of the problem, it has a unique alternative solution which is using the token data to increase the turn count. I think in most situations, including this one, doing proper evaluations and testing to come up with a viable turn count is needed.