Compare
§ Brief
We kept building the same eval framework over and over for different projects. So we extracted it. Groundtruth ingests your existing logs, helps you label slices, and runs structured comparisons across model versions.
§ AI use
- Slice labeling assisted by GPT-4o + Claude for diversity.
- Structured comparison output is deterministic — every chart is a SQL query.