Gemini 2.5 Flash: Preview
Completely disqualified due to unacceptable tool calling failure rate (~4% - 1 in 25 interactions).
Tool Calling Reliability
Poor - ~4% failure rate
Strengths
- Low cost
- Fast response times
Weaknesses
- Tool calling errors ~4% of the time
- Attempts to use XML tags for tool calls (incorrect method)
- Displays raw JSON to users
- Poor user experience
Key Notes
- Attempted to use XML tags to call tools
- XML tags rendered on screen instead of being processed
- Sometimes displayed raw JSON as first message to users
- Completely eliminated from consideration
Analysis
Gemini 2.5 Flash was disqualified due to critical tool calling failures. The model attempted to use XML tags for tool calls, which broke functionality and created terrible user experience. The ~4% failure rate was unacceptable for production use.
Detailed Notes & Findings
Critical Implementation Notes
- Mandatory Tool Calls: You must EXPLICITLY tell Gemini that the first tool call is MANDATORY.
- MCQ Verification: MUST ADD A DISCLAIMER THAT MCQs Must be double checked and resorted.
- Focus: Gemini is good at staying on task and avoiding anything to dissuade.
Mathematics & Scientific Formatting
Math apps need explicit instructions to ensure compatibility with Word processors:
"When writing mathematical, scientific, or chemical expressions, use Unicode symbols instead of LaTeX or Markdown to preserve formatting when copied into Microsoft Word or other word processors. This ensures accurate rendering of equations, subscripts, superscripts, and special characters. Avoid formats that require special rendering environments."
Examples of proper formatting:
- Chemical formulas: H₂O, CO₂, NH₃
- Mathematical expressions: x², y³, n₁ + n₂, α + β = γ, √
- Greek letters: α, β, γ, π, θ, Δ
- Scientific notation: 1.5 × 10⁶, 3.2 × 10⁻⁴
- Fractions: ½, ¼, ¾
- Variables with Subscripts: vₒ, xᵢ, yₙ
Known Issues
XML Tool Call Confusion
Gemini mistakenly thought that the tool call was supposed to be formatted in XML when there was XML formatting used in the prompt. We may just have to stick with the markdown formatting of the prompt to designate categories.

JSON Rendering Failure
Example of Gemini failing to execute a tool call and instead rendering raw JSON output to the user.

Prototype Apps
Embedded prototypes demonstrating Gemini 2.5 Flash performance in various subject contexts.
English App
Open in new tabEconomics App
Open in new tabIntervention App
Open in new tabSocial Studies App
Open in new tabMathematics App
Embed link pending