TL;DR: DeepEval for pytest-native open-source evaluation. Braintrust for full-lifecycle eval with CI/CD quality gates. Arize Phoenix for vendor-neutral self-hosted tracing and eval. LangSmith if you are all-in on LangChain. Comet Opik for budget-conscious teams running high-volume traces.
Promptfoo Is Gone. Now What?
On March 9, OpenAI acquired Promptfoo for $86 million. Promptfoo was the most widely used open-source LLM eval and red-teaming CLI -- 10,800 GitHub stars, used by thousands of teams testing prompts, model outputs, and agent behavior across every major provider.
The acquisition raises an immediate question for anyone using non-OpenAI models: will Promptfoo stay vendor-neutral? The team says yes. The incentive structure says maybe not.
Whether you are running agents on Nebula, LangGraph, CrewAI, or your own framework, eval tooling is non-negotiable. Agents that call tools, make decisions, and interact with production systems need automated testing that catch
Discussion
Say something first
It all starts with you—share your thoughts now.