Story Points, Point Roulette, and the Great Estimation Debate

If you’ve been in a planning poker session where someone put down fifteen points for a task everyone else estimated at five, you’ve witnessed the exact moment a team’s estimation system became a political negotiation. And that moment is almost always the same: someone realized that whoever yells the biggest number longest enough often wins, and the whole thing stops being about understanding work and becomes about theater.

Story points are genuinely useful. They’re just almost always used wrong.

What Story Points Are Actually For

Story points are meant to estimate relative complexity, not time. That’s the core principle. The idea is: we’re not saying “this takes five hours.” We’re saying “this is roughly the same complexity as other things we’ve estimated at five points, or maybe twice as complex as things at two-point-five.”

This is actually smarter than time estimation. Why? Because time estimation is almost always wrong. You estimate something at five hours and hit a weird edge case, and suddenly it’s two days. You estimate something at two days and the acceptance criteria are fuzzy, so you’re guessing. But complexity is more stable. If you say “this is similar complexity to X,” and your team agrees that X’s estimate was reasonable, you have a shared language.

The other thing story points do is make relative velocity visible. If your team averages thirty points per sprint, you can say with some confidence that next sprint you’ll probably do thirty points worth of work. This lets you plan roughly. If you used time estimates instead, you’d be constantly shocked that “five hours” became two days.

Why Teams Fight About Them

Because most teams confuse story points with hours. They estimate something at eight points, ship it in six hours, and then claim the estimate was “wrong.” No. The estimate was about complexity, not time. You got lucky and shipped it fast, or you got unlucky and slogged for two days on something you estimated at three.

The other reason they fight: nobody gets to define what a point is. Is a “one-point story” a thirty-minute task? An hour? Two hours? Different people in the room are using different mental models, so when someone puts down three and someone else puts down thirteen, they’re sometimes not actually disagreeing about the complexity—they’re using different scales.

And then there’s the politics. In too many rooms, the senior engineer or the PM has an invisible veto. They estimate something at three, and everyone knows that if you estimate it at five, you’ll have to defend it. So you converge on three, not because you agree, but because you’re tired. And then you blow the estimate by half because the three-point estimate was actually wrong.

Or worse: the estimates become a way to measure productivity. “You were thirteen points last sprint, only eight this sprint, what happened?” Now the team is gaming the numbers instead of being honest about complexity. Suddenly everything is an eight instead of a five because that looks better in your metrics.

What a Good Estimation Session Actually Teaches You

Here’s what matters about planning poker—the format where people estimate simultaneously and then discuss the outliers. It’s not the number. It’s what comes out when two people estimated something wildly differently.

Say you’re estimating “build an API endpoint for user authentication.” One engineer puts down two, another puts down thirteen. They both can’t be right. So you ask: what did you see that I didn’t?

The person who said two: “We can just add a route, call the existing auth service, return a token.”

The person who said thirteen: “Yeah, but we need to handle the case where the auth service is down. We need to rate-limit failed attempts. We need to log the request. We need to add tests.”

Aha. The two-pointer missed the complexity entirely. The thirteen-pointer was maybe a bit high, but they saw the actual problem. So you talk through it, refine it to a five, and now you ship the endpoint faster because you actually understood what needed doing. The estimate matters less than the conversation.

Most teams skip the conversation. The two-pointer gets talked down, they converge on a number somewhere in the middle, and six weeks later it’s a mess because nobody actually thought through the edge cases.

The Broken Ritual of Point Roulette

Planning poker should surface disagreement. When it doesn’t—when the team converges quickly every time—either the work is very clear, or people aren’t thinking independently before estimating.

Some teams have learned to use planning poker as a way to hide disagreement. Everyone estimates, but then the most senior person or the PM says, “That doesn’t look right to me,” and the team changes their estimate without actually discussing why the gap exists. The whole point of seeing the spread—of understanding where the disagreement is—gets lost.

Or the team estimates in silence and then discusses, but the discussion becomes negotiation instead of investigation. “Let’s meet in the middle” when what you should be doing is understanding what the two-pointer saw that the thirteen-pointer missed, and vice versa.

The ritual becomes theater. Everyone puts down a number, everyone talks for five minutes, everyone converges. The ceremony is complete. You move on. And next sprint, you’re confused about why the estimate was off.

The Honest Version

A team that uses story points well treats estimation as a thinking tool, not a prediction tool. They estimate something. If there’s disagreement, they dig into it. They might refine the work, split it into smaller pieces, or just accept that there’s real complexity here and they’d better account for that in their capacity planning.

They don’t use estimates to reward or punish people. They don’t treat “you were thirteen points last sprint, only eight this sprint” as a productivity signal. They treat points as data: we estimated this as complex, and we were right, so we didn’t get as much done. That’s fine. Next sprint, let’s be more realistic.

They also don’t confuse estimation with commitment. “We estimated it at eight points” doesn’t mean “we promise to ship it in this sprint.” It means “we think this is roughly this complex.” If you estimate something at eight and then discover it’s actually twenty, that’s not a failure. That’s learning.

The real question you should be asking about story points isn’t “are we estimating accurately?” It’s “are we understanding the work before we start?” If estimation is forcing that conversation, great. If estimation has become a number that nobody questions, a political negotiation, or a way to measure productivity, you should probably stop using it.

Some teams skip story points entirely. They use t-shirt sizes (S, M, L, XL) or just say “this is small” and “this is large.” That works too. What matters is that the team has a shared language for complexity, that disagreement surfaces conversation, and that nobody is gaming the numbers.

The ceremonies are supposed to help you ship better, faster. The moment the ceremony becomes the point—the moment you’re optimizing for estimation accuracy instead of understanding—you’ve lost the plot.

Ashlee Lane

Ten-plus years in LMS & learning technology, now navigating the world of product management and operations in SaaS. Writing about systems, people, and the art of getting things done.

Story Points, Point Roulette, and the Great Estimation Debate

What Story Points Are Actually For

Why Teams Fight About Them

What a Good Estimation Session Actually Teaches You

The Broken Ritual of Point Roulette

The Honest Version

The Art of the Effective Meeting (And Why Most Are Just Expensive Silence)

Better to Over-Communicate Than Under: A Lesson from the Train

"Just Looping You In" — The Passive-Aggressive Email Problem