LLMs aren’t solo solvers anymore — they’re forming teams. Open-source tooling is letting Relai-SDK run a full loop: simulate → evaluate → optimize AI agents, with built-in prompts, data traces, and human-in-the-loop evaluators [1].
Relai-SDK opens a repeatable learning loop: simulate → evaluate → optimize AI agents with synthetic traces and real data. It also uses Maestro to tune prompts, configs, and agent graphs for better quality, cost, and latency [1].
Kiln's new Kiln Agent Builder speeds up building agentic systems in minutes, with tools, subtasks, and state memory. It focuses on context management and multi-actor patterns so subtasks stay focused, and Kiln's evals help compare prompts, models, and designs for cost and speed [4].
MCP Agent Mail acts like Gmail for coding agents, letting them communicate across repos, reserve file access, and collaborate with frontier models in an open-source setup [2]. A slick web view helps humans oversee the flow and nudge agents when needed.
TSK lets you sandbox agents, queue tasks, and run multiple agents in parallel — delivering a non-disruptive git branch when tasks finish [5]. It’s a glimpse at background automation that can handle code reviews and ongoing work without constant supervision.
Taken together, these tools point to a productivity paradigm where LLM teams do the heavy lifting, with open-source glue keeping them honest.
References
Show HN: Relai-SDK – simulate → evaluate → optimize AI agents
Open-source Relai SDK enables simulate, evaluate, optimize AI agents with LLM evaluators, prompts, and graph-level tuning for quality, cost, latency.
View sourceOpen-source tool coordinates multiple coding agents across repos with a Gmail-like UI, frontier-model collaboration, and human oversight.
View sourceKiln Agent Builder (new): Build agentic systems in minutes with tools, sub-agents, RAG, and context management [Kiln]
Promotes Kiln's agent-building, subtasks, context management, and evals to optimize model choice, prompts, and performance.
View sourceTool enabling background AI agents (Claude, Codex) in sandboxed containers, parallel execution, and automatic commits for review in Git repositories.
View source