The Model You Love Is Probably Just the One You Use

The following article originally appeared on Medium and is being republished here with the author’s permission. Ask 10 developers which LLM they’d recommend and you’ll get 10 different answers—and almost none of them are based on objective comparison. What you’ll get instead is a reflection of the models they happen to have access to, the […]

Apr 1, 2026 0 1

The Model You Love Is Probably Just the One You Use

The following article originally appeared on Medium and is being republished here with the author’s permission.

Ask 10 developers which LLM they’d recommend and you’ll get 10 different answers—and almost none of them are based on objective comparison. What you’ll get instead is a reflection of the models they happen to have access to, the ones their employer approved, and the ones that influencers they follow have been quietly paid to promote.

We’re all living inside recursively nested walled gardens, and most of us don’t realize it.

This blog's sponsor has an amazing model

The access problem

In corporate environments, the model selection often happens by accident. Someone on the team tries Claude Code one weekend, gets excited, tells the group on Slack, and suddenly the whole organization is using it. Nobody evaluated alternatives. Nobody ran a bakeoff. The decision was made by whoever had a company card and a free Saturday.

That’s not a criticism—it’s just how these things go. But it means that when that same person tells you their favorite model, they’re really telling you which model they’ve had the most reps with. There’s a genuine learning function at play: You get faster, your prompts get better, and the model starts to feel almost intuitive. It’s not that the model is objectively superior. It’s that you’ve gotten good at using it.

This matters more than people admit, because a lot of this space runs on feelings rather than evidence. People feel good about Opus right now. It feels powerful; it feels smart; it feels like you’re using the best tool available. And maybe you are. But ask someone who’s paying for their own tokens whether they feel the same way, and you tend to get a more calibrated answer. Skin in the game has a way of sharpening opinions.

The influence problem

There’s also a lot of money moving through this space in ways that don’t always get disclosed. Model providers are spending real budget to make sure the right people have the right experiences—early access, credits, invitations to the right events. Anthropic does it. OpenAI does it. This isn’t a scandal; it’s just marketing, but it muddies the signal considerably. When someone you follow is effusive about a model, it’s worth asking whether they arrived at that opinion through sustained use or through a curated demo environment.

Meanwhile, some developers—especially those building in the open—will use whatever doesn’t cost an arm and a leg. Their enthusiasm for a model might be more about its pricing tier than its capability ceiling. That’s also a valid signal, but it’s not the same signal.

The alignment problem (the other one)

Then there are the geopolitical considerations. Some developers are deliberately avoiding Qwen and GLM due to concerns about the countries they originate from. Others are using them because they’re compelling, capable models that happen to be dramatically cheaper. Both camps think the other is being naive. This is a real conversation that doesn’t have a clean answer, but it’s happening mostly under the surface.

What I’ve actually been doing

I’ve been forcing myself to test outside my comfort zone. I’ve spent the last week using Codex seriously—not casually—and my experience so far is that it’s nearly indistinguishable from Claude Sonnet 4.6 for most coding tasks, and it’s running at roughly half the cost when you factor in how efficiently it uses tokens. That’s not a small difference. I want to live with it longer before I have a firm opinion, but “a week” is the minimum threshold I’d set for any model evaluation. Anything less and you’re just rating your first impression.

I’ve also started using Qwen and GLM-5 seriously. Early results are interesting. I’ve had some compelling successes and a few jarring errors. I’ll reserve judgment.

What I’ve noticed with my own Anthropic usage is something worth naming: I default to Haiku for well-scoped, mechanical tasks. Sonnet handles almost everything else with room to spare. Opus only comes out when I need genuine breadth—architecture questions, strategic framing, anything with a genuinely wide scope. But I’ve watched people in corporate environments leave the dial on Opus permanently because they’re not paying for tokens themselves. And here’s the thing—that’s actually not always to their advantage. High-powered models overthink simple tasks. They’ll add abstractions you didn’t ask for, restructure things that didn’t need restructuring. When I have a clearly templated class to write, Haiku gets it right at a tenth of the cost, and it doesn’t second-guess the design.

The thing we should be talking about

Everyone last month was exercised about what Sam Altman said about energy consumption. Fine. But I think the more pressing question is about marketing budgets and how they’re distorting the collective understanding of these tools. The benchmarks are starting to feel managed. The influencer coverage is clearly shaped. The access programs create a positive bias among people with the largest audiences.

None of this means the models are bad. Some of them are genuinely remarkable. But when you ask someone which model to use, you’re getting an answer that’s filtered through their employer’s procurement decisions, the influencers they follow, what they can afford, and how long they’ve been using that particular tool. The answer you get tells you a lot about their situation. It tells you almost nothing about the model.

Take it all with appropriate skepticism—including this post.