See real data cleaning tasks working live.

CleanOps simulates the kind of operational cleanup analysts actually do before data reaches a CRM, warehouse, or billing system. The UI below runs the same hosted benchmark API used by the evaluator.

Choose task

Seed

Ready to run a live benchmark task.

Changing the seed changes the visible preview ordering and compare view. It does not change the task score itself.

Fixed tasks, typed actions, shaped rewards, and reproducible graders.

At a glance

Task ladder Easy → Hard

Core API /reset /step /state

Domain CRM + Orders + Billing

Reward signal Dense + partial progress

This homepage is a thin demo over the live environment. It doesn’t fake results: every task button calls the deployed API.

Live Task Snapshot

The cards and table below are populated from a real POST /reset response. Use the task buttons above to switch between benchmark scenarios, or choose your own task and seed.

Task - -

Seed Used -

Initial Score -

Validation Issues -

Focus Table Rows -

Objective

Validation Issues

Available Operations

Before / After Cleaning

Loading compare view...

Dirty input -

Expected clean output -

Focus Table Preview

Raw Demo Payload

Loading live task data...

API & Submission Notes

The evaluator checks these endpoints directly. This page exists to make the environment easier to inspect visually.

GET /health
Service liveness check

Open

GET /schema
Typed OpenEnv schema

Open

GET /docs
Interactive FastAPI docs

Open

POST /reset
Start a task episode

live

POST /step
Apply a typed action

live

GET /state
Inspect current environment state

live

Sample curl

curl -X POST /reset -H "Content-Type: application/json" -d '{"task_id":"customer_contacts_easy","seed":7}'