• Playlab

Gemini 2.5 Flash: Preview

disqualified

Completely disqualified due to unacceptable tool calling failure rate (~4% - 1 in 25 interactions).

Tool Calling Reliability

Poor - ~4% failure rate

Strengths

  • Low cost
  • Fast response times

Weaknesses

  • Tool calling errors ~4% of the time
  • Attempts to use XML tags for tool calls (incorrect method)
  • Displays raw JSON to users
  • Poor user experience

Key Notes

  • Attempted to use XML tags to call tools
  • XML tags rendered on screen instead of being processed
  • Sometimes displayed raw JSON as first message to users
  • Completely eliminated from consideration

Analysis

Gemini 2.5 Flash was disqualified due to critical tool calling failures. The model attempted to use XML tags for tool calls, which broke functionality and created terrible user experience. The ~4% failure rate was unacceptable for production use.

Detailed Notes & Findings

Critical Implementation Notes

  • Mandatory Tool Calls: You must EXPLICITLY tell Gemini that the first tool call is MANDATORY.
  • MCQ Verification: MUST ADD A DISCLAIMER THAT MCQs Must be double checked and resorted.
  • Focus: Gemini is good at staying on task and avoiding anything to dissuade.

Mathematics & Scientific Formatting

Math apps need explicit instructions to ensure compatibility with Word processors:

"When writing mathematical, scientific, or chemical expressions, use Unicode symbols instead of LaTeX or Markdown to preserve formatting when copied into Microsoft Word or other word processors. This ensures accurate rendering of equations, subscripts, superscripts, and special characters. Avoid formats that require special rendering environments."

Examples of proper formatting:

  • Chemical formulas: H₂O, CO₂, NH₃
  • Mathematical expressions: x², y³, n₁ + n₂, α + β = γ, √
  • Greek letters: α, β, γ, π, θ, Δ
  • Scientific notation: 1.5 × 10⁶, 3.2 × 10⁻⁴
  • Fractions: ½, ¼, ¾
  • Variables with Subscripts: vₒ, xᵢ, yₙ

Known Issues

XML Tool Call Confusion

Gemini mistakenly thought that the tool call was supposed to be formatted in XML when there was XML formatting used in the prompt. We may just have to stick with the markdown formatting of the prompt to designate categories.

Gemini confusing XML tool call formatting

JSON Rendering Failure

Example of Gemini failing to execute a tool call and instead rendering raw JSON output to the user.

Gemini failing tool call and rendering JSON

Prototype Apps

Embedded prototypes demonstrating Gemini 2.5 Flash performance in various subject contexts.

English App

Open in new tab

Economics App

Open in new tab

Intervention App

Open in new tab

Social Studies App

Open in new tab

Mathematics App

Embed link pending