InceptBench
Educational content evaluation framework grounded in Incept pedagogy. InceptBench provides one unified evaluator that automatically selects the appropriate evaluation methods based on your content characteristics.
How It Works
Simply provide your educational content along with optional routing parameters (subject, grade, type), and InceptBench automatically determines the best evaluation approach. You don’t need to worry about which internal evaluation methods to run—that’s handled automatically.
Content Types Supported
- Questions: MCQ (Multiple Choice) and Fill-in questions
- Text Content: Educational passages, explanations, and text materials
- Articles: Complete educational documents with markdown formatting, mixed media, and embedded questions (NEW in v1.4.0)
- Visual Content: Images accompanying questions or standalone educational images
All content types are evaluated for pedagogical value, accuracy, grade alignment, and Direct Instruction compliance. Images are automatically detected and evaluated when image_url is provided.
Quick Start
Evaluate educational content using the InceptBench API endpoint:
# Basic evaluation - automatic routing
curl -X POST "https://api.inceptapi.com/evaluate" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer INCEPT_API_KEY" \
-d @qs.json
# With subject and grade for better routing
curl -X POST "https://api.inceptapi.com/evaluate?subject=math&grade=6-8" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer INCEPT_API_KEY" \
-d @qs.json
# Full detailed results
curl -X POST "https://api.inceptapi.com/evaluate?subject=ela&verbose=true" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer INCEPT_API_KEY" \
-d @qs.json
Replace INCEPT_API_KEY with your actual API key and qs.json with your input file path.
Routing Parameters
Provide these optional parameters to help InceptBench select the most appropriate evaluation methods:
subject: Subject area (math,ela,science,social-studies,general)grade: Grade level (e.g.,"K","3","6-8","9-12")type: Content type (mcq,fill-in,short-answer,essay,text-content,passage)verbose: Set totruefor full detailed evaluation (default: simplified scores only)
Evaluation Methods
InceptBench uses multiple specialized evaluation methods under the hood, automatically selected based on your content type and routing parameters. You don’t need to specify which methods to use—the system handles this automatically.
Technical Reference: InceptBench uses these specialized evaluation methods under the hood. They are automatically selected based on your content type and routing parameters—you don't need to configure them manually. Click any card to learn more about the technical details.
TI Question QA
Internal quality assessment across 10 dimensions
- Correctness & grade alignment
- DI compliance & pedagogical value
- Format & instruction adherence
- Language quality & clarity
- Detailed recommendations
Math Content Evaluator
Comprehensive content quality across 9 criteria
- Curriculum alignment validation
- Cognitive demand assessment
- Accuracy & rigor checks
- Pedagogical design evaluation
- Clarity & accessibility scoring
Answer Verification
Fast, independent answer correctness validation
- Lightning-fast verification
- Independent AI validation
- Confidence scoring (0-10)
- Reasoning explanations
- Works across all subjects
Reading Question QC
Specialized MCQ quality and distractor analysis
- Distractor quality analysis
- Grammatical consistency checks
- Answer plausibility scoring
- Question clarity assessment
- Standards alignment verification
Text Content Evaluator
Pedagogical assessment for educational text
- Correctness & factual accuracy
- Grade alignment & clarity
- DI compliance evaluation
- Pedagogical value scoring
- Accept/revise/reject recommendations
How Evaluation Methods Are Selected
The evaluator automatically routes your content to the appropriate evaluation methods:
- Math Questions → Quality assessment + Answer verification + Math content evaluation
- ELA/Reading Questions → Quality assessment + Answer verification + Reading QC + Distractor analysis
- Text Content → Text content evaluation + Subject-specific assessment
- Articles → Holistic article evaluation + Component evaluation (text, questions, images) (NEW in v1.4.0)
- Content with Images → DI image quality evaluation (automatically enabled)
- General Content → Core quality assessment + Content-appropriate specialized evaluation
These methods run in parallel for fast evaluation, and their scores are combined into a final quality score (0-1 scale).
New in v1.4.0: Article holistic evaluation assessing complete educational documents as unified pedagogical experiences.
New in v1.3.0: Automatic image detection and evaluation using Direct Instruction rubric-based scoring.
Technical Reference
For developers who need technical details about the evaluation methods:
- Quality Assessment: 10-dimension pedagogical quality scoring (correctness, grade alignment, DI compliance, etc.)
- Answer Verification: AI-powered correctness validation with confidence scoring
- Reading QC: MCQ distractor quality analysis and passage alignment checks
- Math Content Evaluation: Curriculum alignment, cognitive demand, and rigor assessment across 9 criteria
- Text Content Evaluation: Pedagogical assessment of explanatory text across 8 dimensions
- Article Holistic Evaluation (NEW v1.4.0): Unified pedagogical experience assessment across 10 dimensions (pedagogical coherence, content organization, scaffolding quality, engagement, mixed-media integration, learning objectives clarity, grade appropriateness, completeness, cognitive load management, instructional clarity)
- Image Quality DI Evaluation (NEW v1.3.0): DI rubric-based pedagogical image quality assessment (0-100 scale, auto-enabled for images)
- Math Image Judge (NEW v1.3.0): Vision-based image quality checking using Claude (PASS/FAIL)
Image Evaluation Features:
- Automatic detection when
image_urlis present - Context-aware evaluation (accompaniment vs standalone modes)
- DI rubric scoring with weighted criteria (visual clarity, pedagogical value, age-appropriateness, canonical representation)
- Hard-fail gates for inappropriate content or answer leakage
Note: The specific evaluation methods used are implementation details and may be enhanced over time. Always use the routing parameters (subject, grade, type) rather than trying to manually configure evaluation methods.
Input Format
InceptBench supports three types of educational content:
generated_questions- MCQ and fill-in questions (traditional) - Schema →generated_content- Educational text, passages, explanations - Schema →generated_articles- Complete educational documents with markdown formatting (NEW in v1.4.0) - Schema →
You can provide one or more types in the same request. See full Request Schema → and Response Schema → in the glossary.
{
"subject": "math",
"grade": "6",
"type": "mcq",
"generated_questions": [
{
"id": "q1",
"type": "mcq",
"question": "إذا كان ثمن 2 قلم هو 14 ريالًا، فما ثمن 5 أقلام بنفس المعدل؟",
"answer": "35 ريالًا",
"answer_explanation": "الخطوة 1: تحليل المسألة — لدينا ثمن 2 قلم وهو 14 ريالًا. نحتاج إلى معرفة ثمن 5 أقلام بنفس المعدل. يجب التفكير في العلاقة بين عدد الأقلام والسعر وكيفية تحويل عدد الأقلام بمعدل ثابت.\nالخطوة 2: تطوير الاستراتيجية — يمكننا أولًا إيجاد ثمن قلم واحد بقسمة 14 ÷ 2 = 7 ريال، ثم ضربه في 5 لإيجاد ثمن 5 أقلام: 7 × 5 = 35 ريالًا.\nالخطوة 3: التطبيق والتحقق — نتحقق من منطقية الإجابة بمقارنة السعر بعدد الأقلام. السعر يتناسب طرديًا مع العدد، وبالتالي 35 ريالًا هي الإجابة الصحيحة والمنطقية.",
"answer_options": {
"A": "28 ريالًا",
"B": "70 ريالًا",
"C": "30 ريالًا",
"D": "35 ريالًا"
},
"skill": {
"title": "Grade 6 Mid-Year Comprehensive Assessment",
"grade": "6",
"subject": "mathematics",
"difficulty": "medium",
"description": "Apply proportional reasoning, rational number operations, algebraic thinking, geometric measurement, and statistical analysis to solve multi-step real-world problems",
"language": "ar"
},
"image_url": null,
"additional_details": "🔹 **Question generation logic:**\nThis question targets proportional reasoning for Grade 6 students, testing their ability to apply ratios and unit rates to real-world problems. It follows a classic proportionality structure — starting with a known ratio (2 items for 14 riyals) and scaling it up to 5 items. The stepwise reasoning develops algebraic thinking and promotes estimation checks to confirm logical correctness.\n\n🔹 **Personalized insight examples:**\n- Choosing 28 ريالًا shows a misunderstanding by doubling instead of proportionally scaling.\n- Choosing 7 ريالًا indicates the learner found the unit rate but didn't scale it up to 5.\n- Choosing 14 ريالًا confuses the given 2-item cost with the required 5-item cost.\n\n🔹 **Instructional design & DI integration:**\nThe question aligns with *Percent, Ratio, and Probability* learning targets. In DI format 15.7, it models how equivalent fractions and proportional relationships can predict outcomes across different scales. This builds foundational understanding for probability and proportional reasoning. By using a simple, relatable context (price of pens), it connects mathematical ratios to practical real-world applications, supporting concept transfer and cognitive engagement."
}
],
"verbose": false
}
Output Format
Simplified (Default, verbose: false)
{
"request_id": "06c031fd-6517-4874-8117-2dbeb5554291",
"evaluations": {
"q1": {
"ti_question_qa": {
"overall": 0.911
},
"answer_verification": {
"is_correct": true
},
"reading_question_qc": {
"overall_score": 0.8
},
"math_content_evaluator": {
"overall_score": 1.0
},
"final_score": 0.904
},
"q2": {
"ti_question_qa": {
"overall": 0.933
},
"answer_verification": {
"is_correct": true
},
"reading_question_qc": {
"overall_score": 0.778
},
"math_content_evaluator": {
"overall_score": 0.778
},
"final_score": 0.830
},
"text1": {
"math_content_evaluator": {
"overall_score": 0.778
},
"text_content_evaluator": {
"overall": 0.957
},
"final_score": 0.867
},
"text2": {
"math_content_evaluator": {
"overall_score": 0.778
},
"text_content_evaluator": {
"overall": 0.957
},
"final_score": 0.867
},
"article1": {
"article_holistic_evaluator": {
"overall": 0.835,
"recommendation": "accept"
},
"text_content_evaluator": {
"overall": 0.89
},
"embedded_questions": {
"q1": {
"ti_question_qa": {"overall": 0.87}
},
"q2": {
"ti_question_qa": {"overall": 0.90}
}
},
"images": {
"img1": {
"image_quality_di_evaluator": {"overall": 0.95}
}
},
"final_score": 0.880
}
},
"evaluation_time_seconds": 38.76
}
Note: Evaluation methods automatically apply based on content type and routing parameters. The system intelligently selects appropriate methods—you don’t need to configure them manually.
Full Mode (verbose: true)
Returns detailed scores, issues, strengths, and recommendations for each evaluator.
{
"request_id": "06c031fd-6517-4874-8117-2dbeb5554291",
"evaluations": {
"q1": {
"ti_question_qa": {
"overall": 0.911,
"scores": {
"correctness": 1.0,
"grade_alignment": 0.9,
"difficulty_alignment": 0.9,
"language_quality": 0.85,
"pedagogical_value": 0.95,
"explanation_quality": 0.9,
"instruction_adherence": 0.9,
"format_compliance": 1.0,
"query_relevance": 1.0,
"di_compliance": 0.9
},
"issues": [],
"strengths": ["Clear scaffolded explanation", "Excellent proportional reasoning"],
"recommendation": "accept",
"suggested_improvements": [],
"di_scores": {...},
"section_evaluations": {...}
},
"answer_verification": {
"is_correct": true,
"correct_answer": "35 riyals",
"confidence": 10,
"reasoning": "The answer correctly applies proportional reasoning..."
},
"reading_question_qc": {
"overall_score": 0.8,
"distractor_checks": {...},
"question_checks": {...},
"passed": true
},
"math_content_evaluator": {
"overall_score": 1.0,
"overall_rating": "SUPERIOR",
"curriculum_alignment": "PASS",
"cognitive_demand": "PASS",
"accuracy_and_rigor": "PASS",
"reveals_misconceptions": "PASS",
"question_type_appropriateness": "PASS",
"engagement_and_relevance": "PASS",
"instructional_support": "PASS",
"clarity_and_accessibility": "PASS",
"pass_count": 9,
"fail_count": 0
},
"final_score": 0.904
},
"text1": {
"math_content_evaluator": {
"overall_score": 0.778,
"overall_rating": "ACCEPTABLE",
"pass_count": 7,
"fail_count": 2
},
"text_content_evaluator": {
"overall": 0.957,
"correctness": 1.0,
"grade_alignment": 0.95,
"language_quality": 0.9,
"pedagogical_value": 0.95,
"explanation_quality": 1.0,
"di_compliance": 0.9,
"instruction_adherence": 0.95,
"query_relevance": 1.0,
"recommendation": "accept",
"issues": [],
"strengths": ["Clear conceptual explanation", "Age-appropriate language"],
"suggested_improvements": ["Add more real-world examples"],
"di_scores": {...}
},
"final_score": 0.867
},
"article1": {
"article_holistic_evaluator": {
"pedagogical_coherence": 0.85,
"content_organization": 0.90,
"scaffolding_quality": 0.80,
"engagement": 0.75,
"mixed_media_integration": 0.85,
"learning_objectives_clarity": 0.80,
"grade_appropriateness": 0.90,
"completeness": 0.85,
"cognitive_load_management": 0.80,
"instructional_clarity": 0.85,
"overall": 0.835,
"recommendation": "accept",
"issues": [
"Some transitions between sections could be smoother",
"Question 1 appears slightly before full concept explanation"
],
"strengths": [
"Excellent use of visual diagrams to support text",
"Clear learning progression from basic to advanced",
"Engaging real-world examples throughout"
],
"suggested_improvements": [
"Add transitional sentences between sections",
"Consider moving Question 1 after more detailed concept introduction"
]
},
"text_content_evaluator": {
"overall": 0.89,
"correctness": 0.95,
"grade_alignment": 0.90,
"language_quality": 0.85,
"pedagogical_value": 0.90,
"explanation_quality": 0.88,
"di_compliance": 0.87,
"instruction_adherence": 0.92,
"query_relevance": 0.95
},
"embedded_questions": {
"q1": {
"ti_question_qa": {
"overall": 0.87,
"recommendation": "accept"
},
"answer_verification": {
"is_correct": true,
"confidence": 9
}
},
"q2": {
"ti_question_qa": {
"overall": 0.90,
"recommendation": "accept"
},
"answer_verification": {
"is_correct": true,
"confidence": 10
}
}
},
"images": {
"img1": {
"image_quality_di_evaluator": {
"overall": 0.95,
"score": 95,
"recommendation": "accept"
}
}
},
"final_score": 0.880
}
},
"evaluation_time_seconds": 38.76
}
Full mode includes: Detailed dimension scores, issues, strengths, recommendations, DI compliance breakdowns, and section-level evaluations. For articles (NEW in v1.4.0): Includes holistic evaluation across 10 dimensions plus component-level evaluations for text, embedded questions, and images.
How It Works
Simple 3-Step Process:
- Prepare your content → Structure your questions, text content, or articles in JSON format (include
image_urlfor visual content) - Add routing parameters → Optionally specify
subject,grade, andtypefor better evaluation routing - Send to API → POST to
https://api.inceptapi.com/evaluateand receive comprehensive quality scores
The evaluator automatically selects and runs the appropriate evaluation methods based on your content and parameters. No manual configuration needed.
Article Evaluation (v1.4.0): Complete educational documents evaluated holistically as unified pedagogical experiences, with automatic component evaluation for text, questions, and images.
Automatic Image Detection (v1.3.0): If any content includes an image_url, image quality evaluation is automatically enabled using DI rubric-based assessment.
API Endpoint: https://api.inceptapi.com/evaluate
Current Version: InceptBench v1.4.0
Resources
- API Endpoint:
https://api.inceptapi.com/evaluate - API Documentation: Contact support for API key and detailed documentation
For questions or support, please contact the Incept team.