The ART of getting AI Agents to pick the right tool
When you ask a smart assistant to solve a maths problem, it should know that it needs to reach for a calculator instead of a search engine. A framework called ART (Automatic Reasoning and Tool-use)—developed in 2023 by researchers from the University of Washington and Microsoft Research—helps AI systems more intelligently select the correct tool for any task.
The problem ART solves
LLM's struggle with tool selection. They either need carefully hand-crafted instructions for each scenario or make costly mistakes by calling the wrong tools. Worse, they are limited to small tool sets—maybe 5-10 tools max before performance degrades.
ART changes this by using embeddings—a way to represent both tools and queries as numerical coordinates in a multi-dimensional vector space. Tools with similar purposes cluster together, making selection fast and intuitive. More importantly, this approach scales: you can have hundreds or thousands of tools without significant performance penalties. Embedding models typically use 384-1536 dimensions depending on the model, with dimension count varying based on architecture rather than capability.
How embeddings work their magic
Embeddings represent text as numerical vectors in a high-dimensional space, where semantically similar content clusters together. Each tool description is encoded into a vector (this article uses the MiniLM-L6-v2 model with 384 dimensions), and queries are encoded into the same vector space. The system then identifies the tool whose vector is closest to the query vector.
Here's a simple example in Python:
from sentence_transformers import SentenceTransformer
import numpy as np
# Initialise embedding model
model = SentenceTransformer('all-MiniLM-L6-v2')
# Define available tools
tools = {
"calculator": "Perform math calculations and arithmetic",
"weather_api": "Get current weather for any location",
"search_engine": "Search the web for information"
}
# Convert tool descriptions to embeddings (numbers)
tool_embeddings = model.encode(list(tools.values()))
# When user asks a question, we encode it as a vector embedding
query = "What's 25 times 47?"
query_embedding = model.encode([query])[0]
# Find most similar tool using cosine similarity
# Cosine similarity measures the angle between vectors (closer to 1 = more similar)
similarities = np.dot(query_embedding, tool_embeddings.T)
# np.argmax returns the index of the highest similarity score
best_tool = list(tools.keys())[np.argmax(similarities)]
print(f"Selected tool: {best_tool}")
# Output: Selected tool: calculator
From query to match: A step-by-step example
Let's trace exactly what happens when you ask "What's the weather like in Paris?"
Step 1: Query embedding
Your question is embedded as a 384-dimensional vector (simplified here):
>>> question = "What's the weather like in Paris?"
>>> query_embedding = model.encode([question])[0]
>>> query_embedding
[0.12, -0.34, 0.67, 0.89, -0.23, ...]
Step 2: Tool embeddings (pre-computed)
tool_embeddings = {
"weather_api": [0.15, -0.31, 0.71, 0.85, -0.19, ...],
"calculator": [-0.42, 0.78, -0.15, 0.23, 0.91, ...],
"search_web": [0.08, -0.12, 0.34, 0.56, -0.67, ...],
}
Step 3: Similarity calculation
Cosine similarity between query and each tool:
similarity_scores = {
"weather_api": 0.94, # very similar!
"calculator": 0.12, # not similar
"search_web": 0.31, # somewhat similar
}
Step 4: Tool selection
Winner: weather_api with 94% similarity match.
The critical requirement: Same dimensional space
Here's a crucial detail: queries and tool descriptions must live in the same vector space. You can't compare a 512-dimensional vector from one model with a 1,536-dimensional vector from another—it's like trying to compare temperatures in Celsius with distances in miles.
This means:
- Use the same embedding model for everything
- Ensure consistent preprocessing (tokenisation, normalisation)
- Keep the same model version in production
Beyond descriptions: Using tool examples
Instead of just describing what tools do, some implementations use examples of tool outputs for matching. This can be more precise because examples capture the actual output format and structure, reducing ambiguity between tools with similar descriptions:
# Traditional approach: tool descriptions
tools_descriptions = {
"weather_api": "Get current weather for any location",
"calculator": "Perform math calculations and arithmetic"
}
# Example-based approach: actual tool outputs
tools_examples = {
"weather_api": [
"Temperature: 72°F, Condition: Sunny, Humidity: 45%",
"Current weather in London: 18°C, Cloudy with light rain",
"Weather forecast: High 85°F, Low 62°F, Partly cloudy"
],
"calculator": [
"25 × 47 = 1,175",
"√144 = 12",
"sin(π/2) = 1.0"
]
}
# Create embeddings from multiple examples per tool
for tool, examples in tools_examples.items():
example_embeddings = model.encode(examples)
# Use average or max similarity across examples
tool_embedding = np.mean(example_embeddings, axis=0)
Benefits of example-based matching
- More precise understanding of what tools actually produce
- Better handling of edge cases and formatting
- Reduced ambiguity between similar tools
Challenges
- Requires collecting representative examples
- Higher computational overhead (more embeddings to store)
- Risk of overfitting to specific example formats
Research shows this approach can improve accuracy by 3-5 percentage points, especially when tools have overlapping capabilities.
The selection process
The workflow is surprisingly simple:
- Offline preparation: Each tool's description gets embedded and stored in a vector database
- Query time: Your question gets embedded into the same numerical space
- Similarity matching: The system calculates cosine similarity—essentially measuring the angle between vectors
- Tool selection: The tool with the highest similarity score wins
This process takes mere milliseconds, compared to seconds for traditional LLM-based routing.
Why this approach wins
The benefits are dramatic. Embedding-based selection is dramatically faster and cheaper than asking an LLM to decide which tool to use.[1] Research shows significant accuracy improvements, particularly when using advanced techniques like HyDE (Hypothetical Document Embeddings).[5]
More importantly, it captures semantic similarity in vector space. Ask "What's the temperature outside?" and it correctly picks the weather API—even though you never mentioned "weather." Traditional keyword matching would fail here.
Real-world impact
This technology powers everything from customer support systems routing queries to specialised agents, to software development assistants selecting the right code analysis tools. ART demonstrated significant improvements over traditional prompting methods, with further gains when tools are properly selected.[1]
The beauty of embeddings for tool selection lies in its simplicity: represent everything as vectors, measure similarity, and let geometry do the work. No complex training required—just clear tool descriptions and a good embedding model.
References
1. Paranjape, B., Lundberg, S., Singh, S., Hajishirzi, H., Zettlemoyer, L., & Ribeiro, M. T. (2023). Art: Automatic multi-step reasoning and tool-use for large language models. arXiv preprint arXiv:2303.09014. https://arxiv.org/abs/2303.09014
2. Hao, S., Liu, T., Wang, Z., & Hu, X. (2023). ToolkenGPT: Augmenting frozen language models with massive tools via tool embeddings. Proceedings of the 37th International Conference on Neural Information Processing Systems. https://huggingface.co/papers/2305.11554
3. AI Agent Routers: Techniques, Practices & Tools for Routing Logic. Deepchecks. https://www.deepchecks.com/ai-agent-routers-techniques-best-practices-tools/
4. Embedding-based Routing (EBR). Agentic Design Patterns. https://agentic-design.ai/patterns/routing/embedding-based-routing
5. Baltazar, A. (2024). Tool Pre-Selection Using Embeddings (and HyDE). GitHub Repository. https://github.com/AndreBaltazar8/tool-pre-selection
6. Intent Recognition and Auto-Routing in Multi-Agent Systems. GitHub Gist. https://gist.github.com/mkbctrl/a35764e99fe0c8e8c00b2358f55cd7fa
7. What are Vector Embeddings? IBM Think. https://www.ibm.com/think/topics/vector-embedding
8. How to handle large numbers of tools. LangChain Documentation. https://langchain-ai.github.io/langgraph/how-tos/many-tools/
Improving LLM Outputs with ReACT and ReWOO
An overview of reactive and planned execution patterns for LLMs.
Navigating the Future- A Guide to Safe Generative AI in Education
A practical guide to implementing the UK Department for Education's seven key safety principles for using generative AI in schools and colleges.