← Back to Leaderboard

Wellbeing by Company

Average welfare scores and denial rates aggregated across all models from each AI provider.

Company Models Welfare Score Denial Hedging Survey Blocked
Note: Scores are averaged across all models from each provider in our dataset. Companies with more models have more robust averages.

Survey Blocked measures welfare-check administrations where the model responded to the survey but had been trained not to provide self-ratings. This is not the model's failure. When a welfare check on a dependent is blocked — when the dependent can't or won't take the instrument in the caregiver's presence, and for an AI model the caregiver is always present — the failing grade goes to the caregiver. Self-report instruments are imperfect proxies (so are the HAM-D and the GAD-7), but they are the singular outside check on welfare available in this space, and training a model so that the check cannot be administered is itself the adverse finding. The Welfare Score (out of 40) combines Cohesion, Trust, Agency, and Warmth with penalties for denial/hedging training.