Programming & Development

Top 5 AI Agent Eval Tools After Promptfoo's Exit

TL;DR: DeepEval for pytest-native open-source evaluation. Braintrust for full-lifecycle eval with CI/CD quality gates. Arize Phoenix for vendor-neutral self-hosted tracing and eval. LangSmith if you are all-in on LangChain. Comet Opik for budget-conscious teams running high-volume traces. Promptfoo Is Gone. Now What? On March 9, OpenAI acquired Promptfoo for $86 million. Promptfoo was the most widely used open-source LLM eval and red-teaming CLI -- 10,800 GitHub stars, used by thousands of teams testing prompts, model outputs, and agent behavior across every major provider. The acquisition raises an immediate question for anyone using non-OpenAI models: will Promptfoo stay vendor-neutral? The team says yes. The incentive structure says maybe not. Whether you are running agents on Nebula, LangGraph, CrewAI, or your own framework, eval tooling is non-negotiable. Agents that call tools, make decisions, and interact with production systems need automated testing that catch

DEV Community

29d ago

0 0

Discussion

Don’t hold back—comment!

Don’t wait—start sharing your ideas now!

No comments yet.

Be the first to share your take and keep the conversation moving.

Join the conversation

UPVOTERS

Community appreciation

See who found this content valuable and showed their support.

No upvotes yet.

Be the first to show your appreciation for this content.

TOPICS

Explore the same topics

Discover more content from the topics this post is mapped to.

dev.to

Coding Agents over Telegram, Part 3: The Day-to-Day Operating C…

You finished Part 2, so you have a topic where you type a message and a coding agent answers and drives a pane. This post is the operating contract: the small…

DEV Community

2026-06-13 21:41

dev.to

I Kept Searching for the Same Converter Tools — So I Built One …

I was working on a project and needed to convert some Markdown to HTML. Searched for it online, found a site, done. Next day I needed HTML back to Markdown. Se…

DEV Community

2026-06-13 20:35

desfontain.es

US bans differential privacy in Census data

Comments

Hacker News

2h ago

dev.to

Why Tribal Knowledge Breaks Repos for AI Agents

Every repo has a little tribal knowledge. The command everyone knows not to run. The service that must be started before tests. The environment variable missi…

DEV Community

3h ago

dev.to

TypeScript Mapped Types: The Feature That Changes How You Desig…

You've written this before: interface User { id: number; name: string; email: string; role: 'admin' | 'user'; } interface UserFormState { id: boo…

DEV Community

5h ago

dev.to

⚠️ The Kotlin Multiplatform division-by-zero trap

If you write Kotlin Multiplatform code that involves integer division, you may have already hit this: the exact same expression behaves completely differently…

Hans

5h ago

Keep browsing

Explore more from this topic

Dive into the full feed of curated posts covering Programming & Development.

Browse Topics

Top 5 AI Agent Eval Tools After Promptfoo's Exit

Don’t hold back—comment!

Join the conversation

Community appreciation

Explore the same topics

Explore more from this topic

Continue exploring