Quality Metrics
You cannot improve what you cannot measure, and most agent quality is not obviously measurable: task completion looks binary but collapses when you ask whether the task was completed correctly, efficiently, and safely at the same time. The metrics teams actually track in production are task completion rate, correctness, token and tool-call efficiency, latency, and error rate; the metrics they aspire to track, like code quality or architectural appropriateness, resist automated measurement and require a large language model acting as a judge or periodic human review. Defining metrics before building is critical because they determine what you optimize for: measuring only completion rate produces agents that technically finish tasks with low-quality output, while ignoring cost and latency produces agents that are correct but too slow or expensive to run.