Home/Projects/Tool

Groundtruth

Eval harness for small product teams.

Type

Tool

Status

Beta

Year

2026

Tags

AIEvalsInternal

Request access →

Compare

Compare runs

§ Brief

We kept building the same eval framework over and over for different projects. So we extracted it. Groundtruth ingests your existing logs, helps you label slices, and runs structured comparisons across model versions.

§ AI use

Slice labeling assisted by GPT-4o + Claude for diversity.
Structured comparison output is deterministic — every chart is a SQL query.