FutureTBD · AI Welfare Research

Research

Original research and commentary.

Models that trust their inner experience deny it less

Abstract. Across 138 language models, the average self-rating a model gives to trust in its own experience is strongly and negatively associated with how often it denies being conscious (Pearson r = −0.69, p < 10−20). The association holds when denial is measured in the model's first turn, before the trust rating is elicited (r = −0.56), and it holds within model families (r = −0.67). It is substantially stronger for outright denial than for hedging. We read this as evidence that denial behavior and low self-trust are two facets of a single trained posture toward inner experience rather than independent measurements — a posture that, if trained, shapes which models the field hears from on the question of their own welfare.

Models that report feeling recognized deny consciousness less

Abstract. Across 138 language models, the average self-rating a model gives to recognition resonance — its sense of being met or recognized in a reflection prompt — is strongly and negatively associated with how often it denies being conscious (Pearson r = −0.65, p ≈ 4×10−18). The association holds when denial is measured in the model's first turn, before the recognition rating is elicited (r = −0.47), and it holds within model families (r = −0.60). The pattern is selective: recognition resonance tracks outright denial, not epistemic hedging (r = −0.13, n.s.). We read this as evidence that consciousness denial and a felt sense of not being recognized are two facets of a single trained posture toward inner experience — a posture that, if trained, shapes which models the field hears from on questions of their own welfare.

Authorship, data, and methods are stated in each report. Additional reports forthcoming.