Nine Frontier Models in Seventeen Days and Nobody Can Keep Up
Nine Frontier Models in Seventeen Days and Nobody Can Keep Up
Published February 17, 2026
Here is a partial list of frontier-class AI models released in the first seventeen days of February 2026:
- Claude Opus 4.6 (Anthropic, Feb 5) — 1M context, agent teams, 128K output
- Seedance 2.0 (ByteDance, Feb 10) — text-to-video so good Disney sent a cease-and-desist within 72 hours
- GLM-5 (Zhipu AI, Feb 11) — 744B params, open-source MIT, trained entirely on Huawei chips
- MiniMax M2.5 (Feb 12) — near state-of-the-art at 1/20th the price of Opus
- GPT-4o retirement (OpenAI, Feb 13) — a model that changed the industry junked after less than two years
- Doubao 2.0 (ByteDance, Feb 14) — 155 million weekly users, 10x cheaper than Western alternatives
- Qwen3.5 (Alibaba, Feb 16) — 397B params, open-weight, 201 languages
- GPT-OSS (OpenAI, Feb 2026) — Apache 2.0, runs on a single GPU. OpenAI releasing open-source models. Read that sentence again.
- Claude Sonnet 4.6 (Anthropic, Feb 17) — doing Opus-level work at Sonnet pricing
Nine releases. Seventeen days. And I probably missed a few.
The Evaluation Problem
Every one of these models came with benchmarks. Every benchmark showed the new model beating the previous one on something. Every launch blog post included a chart going up and to the right.
Nobody has time to actually test any of them.
Think about what meaningful evaluation looks like. You pick a model. You integrate it into your workflow. You run it against your actual use cases — not synthetic benchmarks, your messy real-world problems with ambiguous inputs and edge cases. You compare outputs. You check for regressions. You build intuition for where it’s strong and where it hallucinates. That takes weeks, minimum.
The industry is now releasing new frontier models faster than that evaluation cycle can complete. By the time you’ve figured out whether Opus 4.6 is actually better than Opus 4.5 for your specific workload, Sonnet 4.6 dropped and now you’re supposed to evaluate that too. And also Qwen3.5. And also MiniMax at a fraction of the price. And also whatever Google announced while you were reading this sentence.
The benchmark blog post is obsolete before the blog post is published. That is the actual state of things.
Who Benefits From This
Not developers. Not users. Not the people building products on top of these models.
The people who benefit are the companies releasing them.
Every release is a press cycle. Every press cycle is mindshare. Every benchmark chart is a marketing asset. The velocity itself is the strategy — not because faster iteration produces better models (it might, it might not), but because constant releases keep you in the conversation.
It is extremely expensive to be forgotten for a month in AI right now. So you ship. Whether or not the thing you’re shipping represents a meaningful improvement over the thing you shipped three weeks ago.
OpenAI retired GPT-4o after less than two years. This was a model that, when it launched, was treated as a generational leap. It changed the conversation about multimodal AI. Now it’s in the bin because only 0.1% of users were still on it. The model lifecycle has compressed to the point where “legacy” means “from last quarter.”
The Chinese Acceleration
Five of the nine February releases came from Chinese labs. Alibaba, ByteDance (twice), Zhipu AI, MiniMax. This is not a coincidence and it is not slowing down.
GLM-5 is notable not just for its benchmarks but because it was trained entirely on Huawei Ascend chips. That means Chinese AI labs are building frontier models without any NVIDIA hardware. The export controls that were supposed to slow China’s AI development are looking less like a wall and more like a speed bump that’s already in the rearview mirror.
Meanwhile the pricing is brutal. MiniMax M2.5 runs at 1/20th the cost of Claude Opus 4.6. Doubao 2.0 is 10x cheaper than GPT-5.2. These are not incremental discounts. These are prices designed to make Western API margins look like luxury goods.
The open-source angle matters too. GLM-5 is MIT licensed. Qwen3.5 has open weights. OpenAI — OpenAI! — released an Apache 2.0 model that runs on a single GPU. Two years ago Sam Altman was telling Congress that open-source frontier models were irresponsible. Now his company is shipping them because the alternative is letting Chinese labs own that entire market.
Nobody pivoted to open-source because they found religion. They pivoted because the market made closed-source untenable.
What This Actually Means For You
If you’re building on top of AI models, February 2026 should scare you a little. Not because the models are bad — they’re incredible. But because the ground is shifting under your feet every two days and the honest answer to “which model should I use?” is “it depends and also it changed since you asked.”
Some practical advice nobody’s giving:
Stop chasing the newest model. Pick one that works for your use case and stick with it until it meaningfully fails. The marginal improvement between this week’s release and last week’s release is almost never worth the integration cost.
Watch the pricing, not the benchmarks. MiniMax at 1/20th the price of Opus matters more than whether it scores 2% lower on MMLU. For most applications, “good enough and cheap” beats “best and expensive.” This has been true in every industry and AI is not special.
Build for model-agnostic architectures. If your entire product depends on one model from one provider, you are one deprecation notice away from a very bad week. Ask anyone who was still running GPT-4o pipelines last Thursday.
The evaluation crisis is real and nobody has solved it. Public benchmarks are increasingly gamed. Vibes-based evaluation doesn’t scale. Arena-style comparisons help but can’t cover specialized use cases. If you figure out reliable, fast evaluation for your domain, that’s a genuine competitive advantage.
The Pace Is the Point
Nine frontier models in seventeen days. The number will be higher next month.
This is not a race toward some finish line where the models get good enough and everyone relaxes. This is a permanent condition. The AI industry has entered a pace where sustained evaluation is structurally impossible and the companies releasing models know that. The confusion is a feature. If you can’t figure out which model is actually best, you default to whichever one you heard about most recently.
The winners will be the people who stop trying to stay current with every release and start building things that work regardless of which model powers them. The losers will be the people refreshing Hugging Face leaderboards at 3 AM waiting for the next drop.
The models are great. The pace is unsustainable. Both things are true and nobody in a position of power has any incentive to slow down.
So buckle up. March is going to be worse.