Model Migration Update

We have migrated to Claude Haiku 4.5. This move provides significant speed and price improvements over Sonnet 4.5, while delivering a performance boost over Haiku 3.5 and Kimi K2.

Qwen

under evaluation

Still under evaluation. Generally performs well but requires refinement to control unwanted behaviors.

Tool Calling Reliability

Good

Strengths

Generally performs well
Successfully follows most instructions

Weaknesses

Excessive use of emojis and emoticons
Tendency toward sycophantic behavior
Not as naturally aligned with desired formatting

Key Notes

Requires additional prompt engineering to control emoji usage
Needs refinement for professional tone

Detailed Notes & Findings

Strengths

Clarification: Qwen does VERY well when you try to stump it by providing inputs that don’t make sense. It tries to clarify before moving on.
Natural Formatting: Qwen's formatting is already very good out of the box.

Areas for Improvement

Over-eager Execution: Qwen sometimes goes through too many steps in one shot. It will need explicit prompting around that.
Verbose Steps: Needs explicit instructions not to state "STEP 1" or "STEP 2".
Over-engineering: Because formatting is naturally good, applying strict constraints (like those for Gemini) results in over-engineering.

Formatting Observations

When Gemini-specific constraints are applied to Qwen, it results in degraded performance due to over-engineering. Qwen handles formatting well natively and shouldn't be constrained as heavily.

Formatting Issues Reference

QWEN Formatting issues from overengineeering.pdf (See attachments)

Prototype Apps

Embedded prototypes demonstrating Qwen performance across different subjects.

Model Migration Update

Qwen

Tool Calling Reliability

Strengths

Weaknesses

Key Notes

Detailed Notes & Findings

Strengths

Areas for Improvement

Formatting Observations

Prototype Apps

English App

Economics App

Intervention App

Social Studies App

Mathematics App