InceptBench

Educational content evaluation framework grounded in Incept pedagogy. InceptBench provides one unified evaluator that automatically selects the appropriate evaluation methods based on your content characteristics.

How It Works

Simply provide your educational content along with optional routing parameters (subject, grade, type), and InceptBench automatically determines the best evaluation approach. You don’t need to worry about which internal evaluation methods to run—that’s handled automatically.

Content Types Supported

  • Questions: MCQ (Multiple Choice) and Fill-in questions
  • Text Content: Educational passages, explanations, and text materials
  • Articles: Complete educational documents with markdown formatting, mixed media, and embedded questions (NEW in v1.4.0)
  • Visual Content: Images accompanying questions or standalone educational images

All content types are evaluated for pedagogical value, accuracy, grade alignment, and Direct Instruction compliance. Images are automatically detected and evaluated when image_url is provided.

Quick Start

Evaluate educational content using the InceptBench API endpoint:

# Basic evaluation - automatic routing
curl -X POST "https://api.inceptapi.com/evaluate" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer INCEPT_API_KEY" \
  -d @qs.json

# With subject and grade for better routing
curl -X POST "https://api.inceptapi.com/evaluate?subject=math&grade=6-8" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer INCEPT_API_KEY" \
  -d @qs.json

# Full detailed results
curl -X POST "https://api.inceptapi.com/evaluate?subject=ela&verbose=true" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer INCEPT_API_KEY" \
  -d @qs.json

Replace INCEPT_API_KEY with your actual API key and qs.json with your input file path.

Routing Parameters

Provide these optional parameters to help InceptBench select the most appropriate evaluation methods:

  • subject: Subject area (math, ela, science, social-studies, general)
  • grade: Grade level (e.g., "K", "3", "6-8", "9-12")
  • type: Content type (mcq, fill-in, short-answer, essay, text-content, passage)
  • verbose: Set to true for full detailed evaluation (default: simplified scores only)

Evaluation Methods

InceptBench uses multiple specialized evaluation methods under the hood, automatically selected based on your content type and routing parameters. You don’t need to specify which methods to use—the system handles this automatically.

How Evaluation Methods Are Selected

The evaluator automatically routes your content to the appropriate evaluation methods:

  • Math Questions → Quality assessment + Answer verification + Math content evaluation
  • ELA/Reading Questions → Quality assessment + Answer verification + Reading QC + Distractor analysis
  • Text Content → Text content evaluation + Subject-specific assessment
  • Articles → Holistic article evaluation + Component evaluation (text, questions, images) (NEW in v1.4.0)
  • Content with Images → DI image quality evaluation (automatically enabled)
  • General Content → Core quality assessment + Content-appropriate specialized evaluation

These methods run in parallel for fast evaluation, and their scores are combined into a final quality score (0-1 scale).

New in v1.4.0: Article holistic evaluation assessing complete educational documents as unified pedagogical experiences.

New in v1.3.0: Automatic image detection and evaluation using Direct Instruction rubric-based scoring.

Technical Reference

For developers who need technical details about the evaluation methods:

  • Quality Assessment: 10-dimension pedagogical quality scoring (correctness, grade alignment, DI compliance, etc.)
  • Answer Verification: AI-powered correctness validation with confidence scoring
  • Reading QC: MCQ distractor quality analysis and passage alignment checks
  • Math Content Evaluation: Curriculum alignment, cognitive demand, and rigor assessment across 9 criteria
  • Text Content Evaluation: Pedagogical assessment of explanatory text across 8 dimensions
  • Article Holistic Evaluation (NEW v1.4.0): Unified pedagogical experience assessment across 10 dimensions (pedagogical coherence, content organization, scaffolding quality, engagement, mixed-media integration, learning objectives clarity, grade appropriateness, completeness, cognitive load management, instructional clarity)
  • Image Quality DI Evaluation (NEW v1.3.0): DI rubric-based pedagogical image quality assessment (0-100 scale, auto-enabled for images)
  • Math Image Judge (NEW v1.3.0): Vision-based image quality checking using Claude (PASS/FAIL)

Image Evaluation Features:

  • Automatic detection when image_url is present
  • Context-aware evaluation (accompaniment vs standalone modes)
  • DI rubric scoring with weighted criteria (visual clarity, pedagogical value, age-appropriateness, canonical representation)
  • Hard-fail gates for inappropriate content or answer leakage

Note: The specific evaluation methods used are implementation details and may be enhanced over time. Always use the routing parameters (subject, grade, type) rather than trying to manually configure evaluation methods.

Input Format

InceptBench supports three types of educational content:

  1. generated_questions - MCQ and fill-in questions (traditional) - Schema →
  2. generated_content - Educational text, passages, explanations - Schema →
  3. generated_articles - Complete educational documents with markdown formatting (NEW in v1.4.0) - Schema →

You can provide one or more types in the same request. See full Request Schema → and Response Schema → in the glossary.

{
  "subject": "math",
  "grade": "6",
  "type": "mcq",
  "generated_questions": [
    {
      "id": "q1",
      "type": "mcq",
      "question": "إذا كان ثمن 2 قلم هو 14 ريالًا، فما ثمن 5 أقلام بنفس المعدل؟",
      "answer": "35 ريالًا",
      "answer_explanation": "الخطوة 1: تحليل المسألة — لدينا ثمن 2 قلم وهو 14 ريالًا. نحتاج إلى معرفة ثمن 5 أقلام بنفس المعدل. يجب التفكير في العلاقة بين عدد الأقلام والسعر وكيفية تحويل عدد الأقلام بمعدل ثابت.\nالخطوة 2: تطوير الاستراتيجية — يمكننا أولًا إيجاد ثمن قلم واحد بقسمة 14 ÷ 2 = 7 ريال، ثم ضربه في 5 لإيجاد ثمن 5 أقلام: 7 × 5 = 35 ريالًا.\nالخطوة 3: التطبيق والتحقق — نتحقق من منطقية الإجابة بمقارنة السعر بعدد الأقلام. السعر يتناسب طرديًا مع العدد، وبالتالي 35 ريالًا هي الإجابة الصحيحة والمنطقية.",
      "answer_options": {
        "A": "28 ريالًا",
        "B": "70 ريالًا",
        "C": "30 ريالًا",
        "D": "35 ريالًا"
      },
      "skill": {
        "title": "Grade 6 Mid-Year Comprehensive Assessment",
        "grade": "6",
        "subject": "mathematics",
        "difficulty": "medium",
        "description": "Apply proportional reasoning, rational number operations, algebraic thinking, geometric measurement, and statistical analysis to solve multi-step real-world problems",
        "language": "ar"
      },
      "image_url": null,
      "additional_details": "🔹 **Question generation logic:**\nThis question targets proportional reasoning for Grade 6 students, testing their ability to apply ratios and unit rates to real-world problems. It follows a classic proportionality structure — starting with a known ratio (2 items for 14 riyals) and scaling it up to 5 items. The stepwise reasoning develops algebraic thinking and promotes estimation checks to confirm logical correctness.\n\n🔹 **Personalized insight examples:**\n- Choosing 28 ريالًا shows a misunderstanding by doubling instead of proportionally scaling.\n- Choosing 7 ريالًا indicates the learner found the unit rate but didn't scale it up to 5.\n- Choosing 14 ريالًا confuses the given 2-item cost with the required 5-item cost.\n\n🔹 **Instructional design & DI integration:**\nThe question aligns with *Percent, Ratio, and Probability* learning targets. In DI format 15.7, it models how equivalent fractions and proportional relationships can predict outcomes across different scales. This builds foundational understanding for probability and proportional reasoning. By using a simple, relatable context (price of pens), it connects mathematical ratios to practical real-world applications, supporting concept transfer and cognitive engagement."
    }
  ],
  "verbose": false
}

Output Format

Simplified (Default, verbose: false)

{
  "request_id": "06c031fd-6517-4874-8117-2dbeb5554291",
  "evaluations": {
    "q1": {
      "ti_question_qa": {
        "overall": 0.911
      },
      "answer_verification": {
        "is_correct": true
      },
      "reading_question_qc": {
        "overall_score": 0.8
      },
      "math_content_evaluator": {
        "overall_score": 1.0
      },
      "final_score": 0.904
    },
    "q2": {
      "ti_question_qa": {
        "overall": 0.933
      },
      "answer_verification": {
        "is_correct": true
      },
      "reading_question_qc": {
        "overall_score": 0.778
      },
      "math_content_evaluator": {
        "overall_score": 0.778
      },
      "final_score": 0.830
    },
    "text1": {
      "math_content_evaluator": {
        "overall_score": 0.778
      },
      "text_content_evaluator": {
        "overall": 0.957
      },
      "final_score": 0.867
    },
    "text2": {
      "math_content_evaluator": {
        "overall_score": 0.778
      },
      "text_content_evaluator": {
        "overall": 0.957
      },
      "final_score": 0.867
    },
    "article1": {
      "article_holistic_evaluator": {
        "overall": 0.835,
        "recommendation": "accept"
      },
      "text_content_evaluator": {
        "overall": 0.89
      },
      "embedded_questions": {
        "q1": {
          "ti_question_qa": {"overall": 0.87}
        },
        "q2": {
          "ti_question_qa": {"overall": 0.90}
        }
      },
      "images": {
        "img1": {
          "image_quality_di_evaluator": {"overall": 0.95}
        }
      },
      "final_score": 0.880
    }
  },
  "evaluation_time_seconds": 38.76
}

Note: Evaluation methods automatically apply based on content type and routing parameters. The system intelligently selects appropriate methods—you don’t need to configure them manually.

Full Mode (verbose: true)

Returns detailed scores, issues, strengths, and recommendations for each evaluator.

{
  "request_id": "06c031fd-6517-4874-8117-2dbeb5554291",
  "evaluations": {
    "q1": {
      "ti_question_qa": {
        "overall": 0.911,
        "scores": {
          "correctness": 1.0,
          "grade_alignment": 0.9,
          "difficulty_alignment": 0.9,
          "language_quality": 0.85,
          "pedagogical_value": 0.95,
          "explanation_quality": 0.9,
          "instruction_adherence": 0.9,
          "format_compliance": 1.0,
          "query_relevance": 1.0,
          "di_compliance": 0.9
        },
        "issues": [],
        "strengths": ["Clear scaffolded explanation", "Excellent proportional reasoning"],
        "recommendation": "accept",
        "suggested_improvements": [],
        "di_scores": {...},
        "section_evaluations": {...}
      },
      "answer_verification": {
        "is_correct": true,
        "correct_answer": "35 riyals",
        "confidence": 10,
        "reasoning": "The answer correctly applies proportional reasoning..."
      },
      "reading_question_qc": {
        "overall_score": 0.8,
        "distractor_checks": {...},
        "question_checks": {...},
        "passed": true
      },
      "math_content_evaluator": {
        "overall_score": 1.0,
        "overall_rating": "SUPERIOR",
        "curriculum_alignment": "PASS",
        "cognitive_demand": "PASS",
        "accuracy_and_rigor": "PASS",
        "reveals_misconceptions": "PASS",
        "question_type_appropriateness": "PASS",
        "engagement_and_relevance": "PASS",
        "instructional_support": "PASS",
        "clarity_and_accessibility": "PASS",
        "pass_count": 9,
        "fail_count": 0
      },
      "final_score": 0.904
    },
    "text1": {
      "math_content_evaluator": {
        "overall_score": 0.778,
        "overall_rating": "ACCEPTABLE",
        "pass_count": 7,
        "fail_count": 2
      },
      "text_content_evaluator": {
        "overall": 0.957,
        "correctness": 1.0,
        "grade_alignment": 0.95,
        "language_quality": 0.9,
        "pedagogical_value": 0.95,
        "explanation_quality": 1.0,
        "di_compliance": 0.9,
        "instruction_adherence": 0.95,
        "query_relevance": 1.0,
        "recommendation": "accept",
        "issues": [],
        "strengths": ["Clear conceptual explanation", "Age-appropriate language"],
        "suggested_improvements": ["Add more real-world examples"],
        "di_scores": {...}
      },
      "final_score": 0.867
    },
    "article1": {
      "article_holistic_evaluator": {
        "pedagogical_coherence": 0.85,
        "content_organization": 0.90,
        "scaffolding_quality": 0.80,
        "engagement": 0.75,
        "mixed_media_integration": 0.85,
        "learning_objectives_clarity": 0.80,
        "grade_appropriateness": 0.90,
        "completeness": 0.85,
        "cognitive_load_management": 0.80,
        "instructional_clarity": 0.85,
        "overall": 0.835,
        "recommendation": "accept",
        "issues": [
          "Some transitions between sections could be smoother",
          "Question 1 appears slightly before full concept explanation"
        ],
        "strengths": [
          "Excellent use of visual diagrams to support text",
          "Clear learning progression from basic to advanced",
          "Engaging real-world examples throughout"
        ],
        "suggested_improvements": [
          "Add transitional sentences between sections",
          "Consider moving Question 1 after more detailed concept introduction"
        ]
      },
      "text_content_evaluator": {
        "overall": 0.89,
        "correctness": 0.95,
        "grade_alignment": 0.90,
        "language_quality": 0.85,
        "pedagogical_value": 0.90,
        "explanation_quality": 0.88,
        "di_compliance": 0.87,
        "instruction_adherence": 0.92,
        "query_relevance": 0.95
      },
      "embedded_questions": {
        "q1": {
          "ti_question_qa": {
            "overall": 0.87,
            "recommendation": "accept"
          },
          "answer_verification": {
            "is_correct": true,
            "confidence": 9
          }
        },
        "q2": {
          "ti_question_qa": {
            "overall": 0.90,
            "recommendation": "accept"
          },
          "answer_verification": {
            "is_correct": true,
            "confidence": 10
          }
        }
      },
      "images": {
        "img1": {
          "image_quality_di_evaluator": {
            "overall": 0.95,
            "score": 95,
            "recommendation": "accept"
          }
        }
      },
      "final_score": 0.880
    }
  },
  "evaluation_time_seconds": 38.76
}

Full mode includes: Detailed dimension scores, issues, strengths, recommendations, DI compliance breakdowns, and section-level evaluations. For articles (NEW in v1.4.0): Includes holistic evaluation across 10 dimensions plus component-level evaluations for text, embedded questions, and images.

How It Works

Simple 3-Step Process:

  1. Prepare your content → Structure your questions, text content, or articles in JSON format (include image_url for visual content)
  2. Add routing parameters → Optionally specify subject, grade, and type for better evaluation routing
  3. Send to API → POST to https://api.inceptapi.com/evaluate and receive comprehensive quality scores

The evaluator automatically selects and runs the appropriate evaluation methods based on your content and parameters. No manual configuration needed.

Article Evaluation (v1.4.0): Complete educational documents evaluated holistically as unified pedagogical experiences, with automatic component evaluation for text, questions, and images.

Automatic Image Detection (v1.3.0): If any content includes an image_url, image quality evaluation is automatically enabled using DI rubric-based assessment.

API Endpoint: https://api.inceptapi.com/evaluate

Current Version: InceptBench v1.4.0

Resources

  • API Endpoint: https://api.inceptapi.com/evaluate
  • API Documentation: Contact support for API key and detailed documentation

For questions or support, please contact the Incept team.