OpenAI CEO Sam Altman recently issued an internal memo declaring that the company had entered a state of “Code Red” emergency. On the surface, this looks like OpenAI’s reaction to strong competitors such as Google and Anthropic. But the deeper issue is a massive technical bottleneck hitting the entire AI industry, i.e., Training costs are skyrocketing; Model sizes keep ballooning; But performance improvements are slowing down sharply.
According to Stanford’s 2025 AI Index Report:
- From 2019 to 2022: a 10x increase in training cost gave you around 25–35% performance jump.
- After 2023: the same 10x investment yields around 10–15% improvement.
- Since 2024: even doubling the cost gets you less than 5% boost.
It’s like the field has collectively hit an invisible ceiling. This fuels a major debate: Have LLMs reached a dead end?
OpenAI is losing its edge
Recently, Google’s Gemini 3 has beaten OpenAI in several benchmark tests. Gemini’s monthly active users jumped from 450 million in July 26 to 650 million in October. Anthropic’s Claude is also booming in enterprise markets with weekly visits reached 41 million (a 17% increase in six weeks).
But the more concerning news: Industry analysts at SemiAnalysis report that OpenAI hasn’t managed to complete a full, large-scale pretraining run since GPT-4o came out in May 2024. This suggests that GPT-5 is not a true generational upgrade, but rather an optimized version of GPT-4o. SemiAnalysis bluntly stated: “Pretraining a frontier model is the most difficult and resource-intensive challenge in AI. Google’s TPU platform has proven capable; OpenAI has not.”
Without completing pretraining, OpenAI cannot produce the next-generation model, which is a potentially fatal issue for a company dependent on technological leadership. MMLU scores reinforce this: GPT-5 only improved 10–20% over GPT-4, despite training costs that are 20–30× higher (>$1–2B per model). As a result, Altman is pivoting strategy: Focus on improving ChatGPT; Enhance personalization, speed, reliability; Broaden capability coverage; Delay other projects (advertising, health agents, shopping agents, Pulse assistant); Redirect staff and run daily improvement meetings.
OpenAI’s internal alert system includes Yellow → Orange → Red. The Red alert indicates severe competitive and product crisis scenarios.
The bottleneck is industry-wide
Benchmark gaps among leading models have narrowed dramatically. LMSYS Arena shows: In mid-2024, the Elo gap between #1 and #10 models was 150+ points; By late 2025, the gap shrank to <50 points; By September 2025, in MMLU-Pro, nearly all frontier models scored 85–90%, virtually indistinguishable. Model release cycles are slowing:
- Meta: Llama-2 → Llama-3 (9 months), but Llama-4 has already been delayed beyond 15 months
- Anthropic: Claude 3 → Claude 4 (11 months)
These signs suggest the once-reliable Scaling Law is breaking down. Why?
- Irreducible error of language: LLMs learn by predicting the next word. But natural language contains inherent ambiguity (Bayes error) which no amount of scaling can eliminate. Once basic grammar/logic errors are solved, the remaining errors arise from linguistic ambiguity itself – fundamentally unfixable.
- Data exhaustion: OpenAI has already consumed nearly all high-quality text on the internet. Remaining data is mostly low-quality text, spam-repetitive content and AI-generated junk
- Danger of Model Collapse: Training models on AI-generated data causes: Loss of rare but important knowledge; Degeneration of diversity; Amplification of errors; Outputs becoming repetitive and narrow.
A Nature (2024) paper showed recursive training produces rapidly collapsing data distributions – analogous to genetic inbreeding. With the internet now flooded by AI-generated text, model collapse is becoming a real-world risk, not just a theoretical concern.
Two competing theories
The debate splits the AI world into two major factions.
A. Reformists:
“LLMs are not enough; AI must move beyond text.” Led by: Fei-Fei Li (Stanford), Yann LeCun (Meta), DeepMind researchers, MIT/Berkeley cognitive scientists.
Key beliefs: LLMs are only one component of AI, not the path to AGITrue intelligence requires world models: understanding 3D space, physics, causality; Intelligence emerges from interacting with the physical world, not just text; Language is a communication tool – not the basis of thought (supported by cognitive science).
Examples: AlphaGeometry (DeepMind): solves Olympiad-level geometry using symbolic reasoning + neural networks; Babies understand physics before learning language; Blind/deaf individuals think perfectly well without certain sensory channels. LeCun summarizes LLMs as: “Feeding a parrot a bigger chip.” In this view, the future AI architecture is modular:
- LLM = Translator of human intent
- World model = Physical/causal reasoning engine
- Specialist modules = Tools for vision, planning, robotics, etc.
B. Traditionalists:
“Scaling LLMs will still lead to AGI.” Led by:Sam Altman (OpenAI), Ilya Sutskever (OpenAI co-founder), Jared Kaplan (Anthropic)
Key beliefs: The Scaling Hypothesis remains valid; Beyond a certain size, models exhibit emergent intelligence; Compression = Understanding: If a model compresses all world knowledge, it implicitly builds a world model; Language models can still form the foundation of AGI with enough data + compute + training innovations; Even if LLMs are not perfect general intelligence, they can still: Serve as the core reasoning engine; Coordinate other subsystems; Achieve AGI through scale + algorithmic improvements.
MIT cognitive scientists disagree strongly, arguing: “Language is not thought. Thought is independent from language. ”But OpenAI continues to push scaling as the main path.
Conclusion
The LLM field is experiencing: Severe diminishing returns, Data shortages, Compute barriers, Risks of model collapse. This has intensified a fundamental philosophical split: Traditionalists (OpenAI / Anthropic) “Scale will give us AGI.” Reformists (Fei-Fei Li, LeCun, DeepMind, MIT) “LLMs are useful but fundamentally limited; real AI requires world models and grounding.” The future of AI will likely merge both paths, but the industry now stands at a critical crossroads.
Leave a Reply
You must be logged in to post a comment.