Qwen
under evaluation
Still under evaluation. Generally performs well but requires refinement to control unwanted behaviors.
Tool Calling Reliability
Good
Strengths
- Generally performs well
- Successfully follows most instructions
Weaknesses
- Excessive use of emojis and emoticons
- Tendency toward sycophantic behavior
- Not as naturally aligned with desired formatting
Key Notes
- Requires additional prompt engineering to control emoji usage
- Needs refinement for professional tone
Detailed Notes & Findings
Strengths
- Clarification: Qwen does VERY well when you try to stump it by providing inputs that don’t make sense. It tries to clarify before moving on.
- Natural Formatting: Qwen's formatting is already very good out of the box.
Areas for Improvement
- Over-eager Execution: Qwen sometimes goes through too many steps in one shot. It will need explicit prompting around that.
- Verbose Steps: Needs explicit instructions not to state "STEP 1" or "STEP 2".
- Over-engineering: Because formatting is naturally good, applying strict constraints (like those for Gemini) results in over-engineering.
Formatting Observations
When Gemini-specific constraints are applied to Qwen, it results in degraded performance due to over-engineering. Qwen handles formatting well natively and shouldn't be constrained as heavily.
Formatting Issues Reference
QWEN Formatting issues from overengineeering.pdf (See attachments)
Prototype Apps
Embedded prototypes demonstrating Qwen performance across different subjects.