Back to blog
AIApril 13, 20265 min read

Designing human-in-the-loop AI for grading

AI is good at drafting. It is not good at being accountable. For anything with real consequences - grades, compliance flags, money - the model should produce a draft and a human should make the decision.

Every grade BUC Populi generates is held at a pending_review state. Nothing is pushed back to the LMS until a person approves it. The AI compresses hours of work into minutes; the human keeps the authority.

Structured output makes review possible

json
{
  "score": 87,
  "max": 100,
  "status": "pending_review",
  "criteria": [
    { "name": "Thesis", "points": 18, "of": 20 },
    { "name": "Evidence", "points": 22, "of": 25 }
  ],
  "feedback": "Strong argument; tighten the conclusion."
}

Rubric-driven grading response

Defensive parsing matters too. Models occasionally wrap JSON in prose or trail a stray token, so the client tolerates and repairs malformed output rather than crashing the review queue.

  • Never auto-push a model decision with real consequences.
  • Force structured output so the UI can diff and validate.
  • Repair, don't crash, on malformed model responses.