English (United States)
English (United States)

R&D in agent-based systems

Discover what the researchers at Foaster are working on!

R&D in agent-based systems

Discover what the researchers at Foaster are working on!

R&D in agent-based systems

Discover what the researchers at Foaster are working on!

Research at Foaster.ai

At Foaster.ai, we develop AI agents every week, constantly pushing their limits.

Our belief is simple: AI agents are becoming digital teammates. As they gain responsibility and autonomy in critical tasks, understanding their behavioral patterns, decisions, and social dynamics becomes essential.

Model Intelligence: aiming for the best agent-model fit

We conduct applied research to map the actual behavior of models, to match the right model with the right agent (sales, support, back-office, monitoring...).

Rather than judging LLMs solely on code or math, we test their social, strategic, and long-term behaviors in multi-agent contexts — which makes agents reliable in the real world.

Why it's decisive

Choosing a model is no longer a matter of brand or specs. It's a question of fit: persuasion vs. resistance, cooperation style, failure modes, latency/cost trade-offs, robustness under pressure.

Our research provides evidence, not guesswork.

Our method, in brief

Hierarchical multi-agent simulations with tools, roles, and incomplete information — much closer to real workflows than static prompts.

  • Role-conditioned metrics to analyze models from different angles.

  • Behavioral signals beyond win rates.

  • Reproducible protocols and agent-with-tools framework for realism.

Focus: the Werewolf benchmark

Why Werewolf

A 100% language game, adversarial and socially demanding: hidden roles, uncertainty, evolving narratives.

It reveals if a model knows how to plan over several days, coordinate, persuade, bluff or withstand pressure — exactly the skills that make enterprise agents robust.

What we're running

Round-robin matches between models, balanced by role, with Elo leaderboards and breakdown by role (wolves = manipulation, villagers = resistance).

We also capture public vs. private reasoning to study intention vs. narrative — how a model wins (or fails) in reality.

What you receive

A model leaderboard and model cards detailing strengths, weaknesses, and failure modes.

Concrete agent-model recommendations (e.g., which model to place behind your outreach agent vs. your monitoring agent).

Guardrails & prompts tailored to the trends of each model, plus budget/latency advice for production.

And then

We move on to longer and more complex games, to more model families, and to expanded behavioral metrics.

The goal is simple, deliberately competitive: who can beat the current leader?

Want to have your model evaluated or co-fund larger runs? Contact us.

Don't choose your model blindly.

We benchmark its behavior as an agent and then integrate the best fit into your stack.

Any questions?

Any questions?

We are here to assist you - please send us an email!

We are here to assist you - please send us an email!

Ready to hire AI agents?

Ready to hire AI agents?

Ready to hire AI agents?