🧹
CleanOps OpenEnv Operational data cleaning benchmark
Checking live API status...
OpenEnv Benchmark
Real-world Data Cleaning
Deterministic Graders

See real data cleaning tasks working live.

CleanOps simulates the kind of operational cleanup analysts actually do before data reaches a CRM, warehouse, or billing system. The UI below runs the same hosted benchmark API used by the evaluator.

Ready to run a live benchmark task.
Changing the seed changes the visible preview ordering and compare view. It does not change the task score itself.
Fixed tasks, typed actions, shaped rewards, and reproducible graders.

At a glance

Task ladder Easy → Hard
Core API /reset /step /state
Domain CRM + Orders + Billing
Reward signal Dense + partial progress

This homepage is a thin demo over the live environment. It doesn’t fake results: every task button calls the deployed API.

Live Task Snapshot

The cards and table below are populated from a real POST /reset response. Use the task buttons above to switch between benchmark scenarios, or choose your own task and seed.

Task - -
Seed Used -
Initial Score -
Validation Issues -
Focus Table Rows -

Objective

Loading...

Validation Issues

Available Operations

Before / After Cleaning

Loading compare view...
Dirty input -
Expected clean output -

Focus Table Preview

Raw Demo Payload

Loading live task data...

API & Submission Notes

The evaluator checks these endpoints directly. This page exists to make the environment easier to inspect visually.

GET /health
Service liveness check
Open
GET /schema
Typed OpenEnv schema
Open
GET /docs
Interactive FastAPI docs
Open
POST /reset
Start a task episode
live
POST /step
Apply a typed action
live
GET /state
Inspect current environment state
live

Sample curl

curl -X POST /reset -H "Content-Type: application/json" -d '{"task_id":"customer_contacts_easy","seed":7}'