Model Migration Update

We have migrated to Claude Haiku 4.5. This move provides significant speed and price improvements over Sonnet 4.5, while delivering a performance boost over Haiku 3.5 and Kimi K2.

GPT-5 Mini

rejected

Tested but rejected due to latency issues. Time to First Token averaging 25 seconds, with network errors in 2/5 responses.

Tool Calling Reliability

Good but unusable due to latency

Strengths

Good reasoning capabilities
Strong performance on complex tasks

Weaknesses

Time to First Token ~25 seconds average
Network errors in 2/5 responses
Likely input token limit issues

Key Notes

Latency makes it unsuitable for production
Network errors suggest input token limit issues

Analysis

GPT-5 Mini showed promise but was rejected due to unacceptable latency. The average 25-second Time to First Token and network errors made it unsuitable for real-time educational applications.

← Back to Model Evaluation