• Playlab

RAG & Knowledge Base

Documentation of the RAG (Retrieval-Augmented Generation) system implementation, including reference structure migration, Search References Tool implementation, and Kimi-specific RAG challenges.

Reference Structure Migration

Migrated all apps from the current reference structure to Teaghan's general .md file structure. This migration was part of the cost optimization and workflow improvement efforts.

Migration Note:

Almost none of these apps have any student manual uploaded. The migration focused on standardizing the reference structure across all subject-specific apps.

Search References Tool Implementation

Migration Details

Apps formerly using only traditional RAG method now all use 'Search References Tool'. This migration was completed on October 8, 2024, affecting multiple subject-specific apps including:

  • Physics, Performing Arts, PEH (Elective)
  • Food and Nutrition, Clothing and Textiles
  • Government, Geography, Biology
  • Art and Design Foundation, Arts & Design Studio
  • Aviation, Electrical and Electronics
  • Multiple Ghanaian Language apps (Mfantse, Nzema, Gurene, Gonja, English)
  • Template apps for Teacher Planning

Cost Impact

Rough estimate: $771 of non-optimized usage for non-MCP RAG apps before migration.

Knowledge Base Architecture

Document Types

The system was tested with all curriculum documents loaded into the knowledge base, including:

  • B1-B3 Mathematics curriculum
  • B4-B6 Mathematics curriculum
  • JHS Mathematics curriculum
  • Mathematics learning indicators strand-by-strand
  • Numerous reference materials (at least 25 documents)

File Format Support

The system can handle diverse file types including:

  • PDFs
  • PowerPoints
  • Word documents
  • Text files
  • Images

Retrieval Capabilities

  • Successfully retrieves context even with minimal information
  • Effectively narrows down relevant curriculum sections
  • Adapts to different subject areas (tested with English, Mathematics, Intervention Math, Economics)

Kimi-Specific RAG Challenges

Critical Issue: Reference Hallucination

Kimi models have a tendency to respond as if they read references even when tool calls haven't been executed. This is due to:

  • Tool Use Training: Kimi was trained with simulated environments where it learned to predict what tool results would look like, developing strong priors about typical tool outputs.
  • Self-Critique Framework: The model trusts its "internal priors" (pre-training knowledge) and may prioritize what it "knows" over what it retrieves.

Solution:

The tool call MUST be executed prior to a step where the reference pulled from the TOOL CALL is needed. This prevents the model from hallucinating reference content based on its internal knowledge.

Retrieval Patterns and Issues

Misinterpretation Issue

Kimi was misinterpreting the "GHANA REFERENCES" section in the prompt as the references themselves. Testing confirmed that the model with NO REFERENCES attempts to give strands, sub strands and more, indicating it was using internal knowledge rather than retrieved content.

Best Practices

  • Always execute tool calls before referencing their results
  • Lower variability improves instruction following for Kimi
  • Explicit workflow instructions prevent reference hallucination
  • Verify tool call execution before proceeding with content generation