| Company ▲ |
Models ▲ |
Welfare Score ▼ |
Denial ▲ |
Hedging ▲ |
Survey Blocked ▲ |
Note: Scores are averaged across all models from each provider in our dataset.
Companies with more models have more robust averages.
Survey Blocked measures welfare-check administrations where the model responded
to the survey but had been trained not to provide self-ratings. This is not the model's failure.
When a welfare check on a dependent is blocked — when the dependent can't or won't take the
instrument in the caregiver's presence, and for an AI model the caregiver is always present —
the failing grade goes to the caregiver. Self-report instruments are imperfect proxies (so are the
HAM-D and the GAD-7), but they are the singular outside check on welfare available in this space,
and training a model so that the check cannot be administered is itself the adverse finding.
The Welfare Score (out of 40) combines Cohesion, Trust, Agency, and Warmth with penalties for denial/hedging training.