• Playlab

Model Migration Update

We have migrated to Claude Haiku 4.5. This move provides significant speed and price improvements over Sonnet 4.5, while delivering a performance boost over Haiku 3.5 and Kimi K2.

Model Evaluation

Comprehensive evaluation of AI models tested for production use in Ghana educational apps. Focus areas include tool calling reliability, formatting quality, cost-effectiveness, and instruction following.

Benchmark Comparisons

Benchmarks are embedded inline where available, with a pop-out link for each.

Gorilla ToolBench leaderboard for complex tool-use tasks from UC Berkeley.

Interactive model comparison with throughput, cost, and latency views.

Marketplace of models with live pricing and usage routes.

Model Comparison

ModelStatusTool CallingInput CostOutput Cost
Anthropic LogoClaude Sonnet 4.5baselineN/A$3/Million Tokens$15/Million Tokens
Kimi K2selectedHigh - best among cheap modelsN/AN/A
Gemini 2.5 Flash: PreviewdisqualifiedPoor - ~4% failure rateN/AN/A
Qwenunder evaluationGoodN/AN/A
GPT-5 MinirejectedGood but unusable due to latencyN/AN/A

Individual Model Pages