Glossary

A comprehensive guide to key terms used throughout the InceptBench framework and educational content evaluation.

A

Answer Verification

A fast AI-powered evaluator that validates the correctness of answers across all subjects. Returns boolean correctness, confidence level (0-10), and reasoning explanation.

Article

A complete educational document combining multiple learning components (text, images, embedded questions) into a unified pedagogical experience. Formatted in markdown with hierarchical headings, sequential flow, and contextual integration of mixed-media elements. NEW in InceptBench v1.4.0.

Article Holistic Evaluator

Comprehensive evaluation of articles as unified pedagogical experiences (NEW in v1.4.0). Assesses across 10 dimensions: pedagogical coherence, content organization, scaffolding quality, engagement, mixed-media integration, learning objectives clarity, grade appropriateness, completeness, cognitive load management, and instructional clarity. Returns 0-1 scores, recommendation (accept/revise/reject), issues, strengths, and suggested improvements. Learn more →

B

Benchmark Methodology

A systematic approach using specialized evaluators to assess educational content generation systems. Unlike benchmark results, a methodology defines the framework and evaluation criteria used for assessment.

C

Curriculum Alignment

The degree to which educational content matches specific curriculum standards, grade-level expectations, and learning objectives.

D

Direct Instruction (DI)

A highly structured, teacher-led instructional approach emphasizing clear, explicit teaching of specific skills in a systematic manner. One of the core pedagogical principles of Incept. Learn more →

Distractor

An incorrect answer option in a multiple-choice question designed to test understanding by representing common misconceptions or errors.

E

EduBench (External)

An external open-source benchmark included in InceptBench for diversity and external baseline comparison. Evaluator name: external_edubench. Evaluates across 6 educational tasks (QA, EC, IP, AG, QG, TMG) with scores on a 0-10 scale, averaged then normalized to 0-1. Learn more →

Evaluator

The unified InceptBench evaluation system that automatically assesses educational content quality. The evaluator intelligently routes to specialized internal methods based on content type and parameters—no manual configuration required.

Evaluator Version

Version identifier for evaluators (e.g., v1.0.0) indicating the specific release and capabilities of an evaluation tool.

F

Final Score

An aggregated quality score (0-1 scale) calculated across all evaluators run on a specific question, providing an overall assessment of content quality.

G

Grade Alignment

A dimension measuring how well educational content matches the cognitive and developmental level appropriate for a specific grade.

Generated Article Schema

Input schema for article evaluation (NEW in v1.4.0). Structure:

{
  "type": "article",
  "content": "Full markdown content with headings, text, images, questions...",
  "title": "Article Title",
  "skill": { /* Skill schema */ }
}

Required fields: type (must be “article”), content (markdown string). Optional: title, skill. The content should be formatted as standard markdown with headings (#, ##, ###), images (![alt](url)), and embedded questions following the format: ### Question N followed by question text, options, correct answer, and explanation.

Generated Content Schema

Input schema for text content evaluation. Structure:

{
  "id": "text1",
  "type": "text | passage | explanation",
  "content": "Educational text content...",
  "title": "Optional title",
  "skill": { /* Skill schema */ },
  "image_url": "optional_image_url",
  "additional_details": "optional_context"
}

Required fields: id, type, content. Optional: title, skill, image_url, additional_details.

Generated Question Schema

Input schema for question evaluation. Structure:

{
  "id": "q1",
  "type": "mcq | fill-in",
  "question": "Question text",
  "answer": "Correct answer",
  "answer_explanation": "Step-by-step explanation",
  "answer_options": {"A": "option1", "B": "option2", ...},
  "skill": { /* Skill schema */ },
  "image_url": "optional_image_url",
  "additional_details": "optional_context"
}

Required fields: id, type, question, answer, answer_explanation. For MCQ: answer_options required. Optional: skill, image_url, additional_details.

I

Image Quality DI Evaluator

DI rubric-based pedagogical image quality assessment (NEW in v1.3.0). Automatically enabled when image_url is present in content. Evaluates visual content using weighted criteria: visual clarity, pedagogical value, age-appropriateness, and canonical representation. Provides context-aware evaluation (accompaniment vs standalone modes) with 0-100 scoring normalized to 0-1 scale. Includes hard-fail gates for inappropriate content or answer leakage.

Image Quality Evaluation

Automatic detection and assessment of educational images using Direct Instruction rubric-based scoring (v1.3.0). When any content includes an image_url, image quality evaluation is automatically enabled to ensure visual content meets pedagogical standards. Evaluates both images that accompany text (accompaniment mode) and standalone educational images.

InceptBench

A unified evaluation framework for educational content that automatically routes to specialized assessment methods based on content characteristics. One intelligent evaluator that handles all K-12 subjects and content types without manual configuration. Designed to be target-system agnostic. Current version: v1.4.0.

K

K-12

Kindergarten through 12th grade, representing the full span of primary and secondary education in many educational systems.

M

Math Content Evaluator

Comprehensive content quality assessment across 9 educational criteria: curriculum_alignment, cognitive_demand, accuracy_and_rigor, image_quality, reveals_misconceptions, question_type_appropriateness, engagement_and_relevance, instructional_support, and clarity_and_accessibility. Scale: 0-1 (pass_count / 9). Learn more →

Math Image Judge

Vision-based image quality checking using Claude (NEW in v1.3.0). Provides PASS/FAIL binary evaluation for mathematical visual problems with object counting verification. Returns pass_score (1.0 for PASS, 0.0 for FAIL), detailed descriptions, and individual image ratings. Used for advanced vision-based validation of math content images.

MCQ (Multiple Choice Question)

A question format presenting one correct answer alongside multiple distractors, requiring students to identify the correct option.

MTSS (Multi-Tiered System of Supports)

A comprehensive framework of evidence-based practices designed to meet the diverse academic and behavioral needs of all students through tiered interventions. Learn more →

P

Pedagogy

The method and practice of teaching, including instructional strategies, learning theories, and educational principles. Incept pedagogy is grounded in 8 core pillars. Learn more →

Pedagogical Value

A dimension assessing how well educational content promotes effective learning, critical thinking, and skill development.

Q

Quality Control (QC)

Systematic evaluation processes ensuring educational content meets defined standards for accuracy, clarity, and pedagogical effectiveness.

TI Question QA

Internal quality assessment across 10 dimensions: correctness, grade_alignment, difficulty_alignment, language_quality, pedagogical_value, explanation_quality, instruction_adherence, format_compliance, query_relevance, and di_compliance. Ideal for general question quality, pedagogical value, format compliance, and DI adherence. Scale: 0-1. Learn more →

R

Reading Question QC

A specialized evaluator focusing on MCQ quality assessment for reading comprehension, including distractor analysis, question clarity, and standards alignment.

Recommendation

An evaluator output classifying content as “accept” (ready to use), “revise” (needs improvements), or “reject” (does not meet standards).

Request Schema

InceptBench API request structure:

{
  "subject": "math | ela | science | social-studies | general",
  "grade": "K | 1-12",
  "type": "mcq | fill-in | text-content | passage | article",
  "generated_questions": [ /* Array of question objects */ ],
  "generated_content": [ /* Array of text content objects */ ],
  "generated_articles": [ /* Array of article objects (NEW v1.4.0) */ ],
  "verbose": false
}

At least one required: generated_questions OR generated_content OR generated_articles. Routing parameters (optional): subject, grade, type help the evaluator select appropriate methods. verbose: false (simplified scores only) or true (full details with issues/strengths). The evaluator automatically determines which internal methods to use based on your content and parameters. See Generated Question Schema, Generated Content Schema, and Generated Article Schema.

Routing Parameters

Optional parameters (subject, grade, type) that help InceptBench automatically select the most appropriate evaluation methods for your content. These parameters enable intelligent routing without manual configuration of evaluation methods.

Response Schema

InceptBench API response structure:

{
  "request_id": "uuid",
  "evaluations": {
    "q1": {
      "ti_question_qa": {"overall": 0.911},
      "answer_verification": {"is_correct": true},
      "reading_question_qc": {"overall_score": 0.8},
      "math_content_evaluator": {"overall_score": 1.0},
      "final_score": 0.904
    },
    "q_with_image": {
      "ti_question_qa": {"overall": 0.850},
      "answer_verification": {"is_correct": true},
      "image_quality_di_evaluator": {"normalized_score": 0.750},
      "final_score": 0.800
    },
    "text1": {
      "math_content_evaluator": {"overall_score": 0.778},
      "text_content_evaluator": {"overall": 0.957},
      "final_score": 0.867
    },
    "article1": {
      "article_holistic_evaluator": {"overall": 0.835, "recommendation": "accept"},
      "text_content_evaluator": {"overall": 0.89},
      "embedded_questions": {"q1": {...}, "q2": {...}},
      "images": {"img1": {...}},
      "final_score": 0.880
    }
  },
  "evaluation_time_seconds": 38.76,
  "inceptbench_version": "1.4.0"
}

Simplified mode (default, verbose: false): Returns only overall scores (~95% smaller output). Verbose mode (verbose: true): Returns detailed scores, issues, strengths, and recommendations for each evaluation method. Article evaluation (NEW v1.4.0): Includes holistic evaluation plus component evaluations for text, embedded questions, and images. Image evaluation (NEW v1.3.0): Automatically included when content has image_url. Note: Evaluation methods automatically apply based on content type and routing parameters. The system intelligently selects appropriate methods without manual configuration.

S

Scarborough’s Reading Rope

A research-based framework illustrating how multiple strands of language and literacy skills interweave to create skilled reading comprehension. Learn more →

Skill

A specific, well-defined learning objective or competency within a subject and grade level.

Skill Schema

Metadata schema for educational content. Structure:

{
  "title": "Skill or assessment title",
  "grade": "6",
  "subject": "mathematics",
  "difficulty": "easy | medium | hard",
  "description": "Optional description",
  "language": "en | ar"
}

Required fields: title, grade, subject. Defaults: subject="mathematics", difficulty="medium", language="en". Optional: description.

Subject

An academic discipline or area of study (e.g., Mathematics, Reading Comprehension, Science).

T

Target System

The educational content generation system being evaluated by InceptBench. The framework is designed to be system-agnostic and work with any K-12 content generator.

Text Content

Educational passages, explanations, and text materials that can be evaluated for pedagogical quality using InceptBench. Includes passages, explanations, and general educational text (distinct from questions).

Text Content Evaluator

Comprehensive pedagogical assessment of educational text content across 8 dimensions: correctness, grade_alignment, language_quality, pedagogical_value, explanation_quality, di_compliance, instruction_adherence, and query_relevance. Ideal for evaluating passages, explanations, and educational text materials. Applies to text content only. Scale: 0-1. Learn more →

Pedagogy Overview - Understand the 8 pillars of Incept pedagogy
Benchmark Methodologies - Explore available benchmarking frameworks
Evaluators - Learn about evaluation tools and their capabilities