Transformers powering every major AI model are facing pushback. The debate now centers on architecture, fidelity, and whether smarter reasoning actually means a better model. A standout jab comes from the Sakana AI CTO, who says he’s sick of transformers powering everything [1].
Transformer critique — The criticism is blunt: if the backbone is always the same, are we really solving deeper limits or just applying a familiar hammer to new nails? Critics point to architecture as a movable frontier, not a fixed lecture.
Reasoning vs model improvement — An essay argues that better reasoning doesn’t automatically improve the model, and tool orchestration may be masking plateauing capabilities. Proponents push for exploring alternatives like graph-based or sparse-attention approaches and other architectures that preserve semantic meaning [2].
Fidelity decay, not just hallucination — Researchers propose measuring meaning loss: words drift, nuance flattens, and context erodes, even when outputs look factual [3]. This reframing shifts the goal from “no errors” to “sustained semantic fidelity.”
Not a black box after all — Some take the stance that LLMs aren’t purely opaque boxes of trivia; there’s structure to study and ways to evaluate beyond surface-level outputs [4].
Emergent CoT? — The CoT question endures: is chain-of-thought still an emergent property, or can targeted training and data tricks induce CoT-like reasoning in smaller models? The discussion threads point to scale and data choice as key levers [5].
Takeaway: researchers are rethinking architecture, evaluation, and what counts as real progress in AI.
References
Sakana AI CTO says he's 'sick' of transformers that powers every major AI model
Sakana AI CTO criticizes transformer technology as overused across major AI models
View sourceReasoning Is Not Model Improvement
Author seeks feedback on o1's arithmetic behavior, argues model capabilities plateau, and asks about graph transformers and other architectures.
View sourceThe Failure Mode of AI Isn't Hallucination, It's Fidelity Loss
Argues that LLM errors stem from fidelity decay, not hallucinations; proposes meaning collapse measurement and semantic drift
View sourceAn LLM Is (Not Really) a Black Box Full of Sudoku and Tic Tac Toe Games
Argues LLMs are not pure black boxes; challenges simplistic Sudoku and Tic Tac Toe metaphors, exploring hidden capabilities and limits.
View sourceIs Chain of Thought Still An Emergent Behavior?
Examines whether chain-of-thought remains emergent with scaling; cites distillation, ReACT, data, architecture, IA2; asks for recent evaluations
View source