Four AI engineering libraries.
One agent. Watch them work.

Every message is traced by TraceForge, scored by Evalify, adversarially probed by RedForge, and versioned by StateForge. Send a question to see them in action.

TraceForge

Records every LLM call, tool call, token count and latency as replayable spans.

Evalify

LLM-judge scoring across relevance, accuracy and safety — after every turn.

RedForge

Prompt-injection & jailbreak probes run in the background on your first message.

StateForge

Git-like memory snapshots and diffs of what the agent knew, and when.