AIQCAT | AI Quantitative Capability Assessment Test

Background

What CAT is.

Computerized Adaptive Testing is a class of assessment methods grounded in Item Response Theory (IRT). Instead of giving every candidate the same fixed-form test, a CAT engine selects each next item based on the candidate's running ability estimate — easier items if recent responses suggest lower ability, harder items if higher. The assessment ends when the ability estimate reaches the configured precision threshold.

Where it is used

Established practice.

CAT has been the operational testing methodology behind major assessments for decades — including the GMAT (since 1997), the NCLEX nursing licensure exam, several armed-forces aptitude batteries (CAT-ASVAB), and large-scale K–12 state assessments. It is one of the most studied and validated assessment methodologies in modern educational measurement.

Precision

CAT typically reaches the same measurement precision as a fixed-form test using fewer items, by concentrating items near the candidate's ability level.

Item exposure control

Modern CAT engines manage exposure of items so that the bank stays usable across many candidates over time.

Calibrated bank

Every operational item is pre-calibrated against an IRT model so that an ability estimate is meaningful across forms and across time.

How AIQCAT uses CAT

AIQCAT's adaptive design.

AIQCAT's assessment draws on the CAT approach, with the difference that the "item" under CAT in our setting is typically a real work artifact rather than a multiple-choice question. The adaptive engine selects the next task or prompt based on the candidate's running estimate across the six dimensions, and stops when the estimate reaches the configured precision. Item authoring is handled by the Question Factory, where a swarm of 5–10 agents drafts and calibrates items in parallel for each organization's exam.

Item authoring

Question Factory — 5–10 parallel agents per exam

Item types

Artifact-grading items (Excel, PDF, image, video) primarily

Ability-estimation algorithm

IRT-family, multidimensional across the six dimensions

Stopping rule

Precision-based (target standard error per dimension)

Grading

Consensus of 1,000 AI evaluator engines + examiner sample review

Equating across forms

Anchored to a held-out reference set for comparability

Why CAT for AI competency

Why this is the right base methodology.

AI competency varies widely across individuals, and the cost of grading a real work artifact is non-trivial. CAT is well suited to this setting because (a) it concentrates grading effort on the items that maximise information about the candidate's ability and (b) it produces results that are comparable across candidates and across forms, without making every candidate sit identical tasks.

What we don't claim

Boundaries.

AIQCAT is not, on its own, an authority on CAT methodology. We adopt established CAT practice as documented in the educational-measurement literature (Lord, Wainer, van der Linden et al.) and apply it to the artifact-grading setting. Reliability and predictive-validity studies are ongoing and published in the Research section as data matures under review.

See the six capability axes for what is being measured and Dimensions for the six dimensions that feed the ability estimate.

CAT — Computerized Adaptive Testing