BabyVision-v2 / QA Tasks
loading…

QA Tasks

Question-answering task families generated by the BabyVision-v2 deterministic pipeline. Each family targets a distinct reasoning skill; every task ships with executable ground truth (no LLM-as-judge).

V2 — Cross-section / topology

2D-view → 3D inference · cannot be shortcut by code · geometric

Details →

V3 — Segmentation / referring

pixel-level GT · SSIM / LPIPS · defends against LLM-as-judge

Details →

V4 — Manipulation / counterfactual

rotate · slice · deform · geometry-grounded GT

Details →

V6 — Multi-step reasoning

multi-tool trace · scored on answer + trace fidelity

Details →

Interactive Physics

force control · MPM fracture · MPM granular · FEM cloth · P1/P2 — discover by acting on objects

Details →

Spatial Vision

M3D · IXI brain · distance / angle / topology / volume

Details →

Live task-instance data lands when QA-Gen artifacts begin shipping. See task-families.md for the canonical definitions.