AI is good at drafting. It is not good at being accountable. For anything with real consequences - grades, compliance flags, money - the model should produce a draft and a human should make the decision.
Every grade BUC Populi generates is held at a pending_review state. Nothing is pushed back to the LMS until a person approves it. The AI compresses hours of work into minutes; the human keeps the authority.
Structured output makes review possible
json
{
"score": 87,
"max": 100,
"status": "pending_review",
"criteria": [
{ "name": "Thesis", "points": 18, "of": 20 },
{ "name": "Evidence", "points": 22, "of": 25 }
],
"feedback": "Strong argument; tighten the conclusion."
}
Rubric-driven grading response
Defensive parsing matters too. Models occasionally wrap JSON in prose or trail a stray token, so the client tolerates and repairs malformed output rather than crashing the review queue.
- Never auto-push a model decision with real consequences.
- Force structured output so the UI can diff and validate.
- Repair, don't crash, on malformed model responses.