Are LLMs Really Thinking? The Introspection and Confidence Debate

Are LLMs really thinking? A wave of online discussions tests introspection and self-assessment in large language models. From injection experiments to questions about confidence, the debate is heating up in 2025. ^[1]

Introspection experiments — A study described in post 1 injects internal concept vectors (like “ALL CAPS” or “dogs”) into a model and asks whether it notices. Default responses deny any injected concept, but after injection the model links the concept to loudness and can identify the injected thought quickly. They also prefix an out-of-place word (e.g., “bread”) and compare responses when the concept is injected versus not. Notably, Claude Opus 4.1 was the strongest at these protocols, though the effect is often limited (about 20% success) ^[1].

Anthropic signs — Anthropic’s research posts signs of introspection in large language models, framing it as detectable changes in how models reflect on their outputs ^[2].

Community takeaways — A Reddit thread discusses the same material, noting hopes for deeper personalization and concerns about alignment as users imagine injecting thoughts or tweaking personalities. The dialogue touches on how prompts and UI layers can shape model behavior, not just weights ^[3].

Self-assessment reliability — A piece asks: Do LLMs know when they’ve gotten a correct answer? The question sits at the heart of whether introspective signals reflect real confidence or just clever prompting ^[4].

Hard prompts perspective — Hard Prompts is a curated gallery that lets people compare how AI models respond to intriguing questions, with multiple responses per model to reveal variability and “personality” in practice ^[5].

Closing thought: the signals are real but uneven. Expect more experiments and sharper debates as researchers and communities push for reliable introspection—and safer personalization.

References

[1]

HackerNews

Emergent Introspective Awareness in Large Language Models

Explores emergent introspection in LLMs via concept injection, out-of-context prompts, and internal-state control; notes limits and variability.

View source

[2]

HackerNews

Signs of introspection in large language models

Explores evidence for introspective behavior in LLMs, discusses definitions, limits, and evaluation challenges of introspection in AI systems

View source

[3]

Large language models show signs of introspection

Discusses Claude introspection, prompt injection risks, and personalization potential; compares with closed models and experiments like Golden Gate Claude recently

View source

[4]

HackerNews

Do LLMs know when they've gotten a correct answer?

Questioning whether LLMs can recognize their own correctness, exploring confidence, verification, and potential limitations of truth claims in chat discussions.

View source

[5]

HackerNews

Show HN: Hard Prompts – Compare how AI models respond to interesting questions

Curated prompts gallery comparing model personalities; multiple models, four responses each, exploring sampling and behaviors.

View source

References

Emergent Introspective Awareness in Large Language Models

Signs of introspection in large language models

Large language models show signs of introspection

Do LLMs know when they've gotten a correct answer?

Show HN: Hard Prompts – Compare how AI models respond to interesting questions

Want to track your own topics?