Power rankings are everywhere during football season. Most are opinion polls dressed up as analysis. Here's how the analytics-based approach works, and why it produces better predictions.
Power rankings attempt to order all 32 NFL teams from best to worst at a specific point in the season. That sounds like standings, but it’s fundamentally different. Standings reflect results: wins and losses, accumulated over the season. Power rankings reflect ability: how good is this team right now, regardless of their record?
A 6-3 team that won three close games against bad opponents and lost three nail-biters to playoff teams might sit at #15 in power rankings despite being 4th in their conference standings. A 4-5 team that lost five one-score games against top-10 opponents might rank higher. The ranking captures what the record doesn’t: actual performance quality.
This distinction matters most for prediction. If you want to know who will win next week, the team’s current ability matters more than their accumulated win count. A team on a three-game losing streak might still be the better team if those losses came by a combined 7 points against elite opponents. Power rankings capture this. Standings don’t.
ESPN, CBS, NFL.com, and The Athletic all publish weekly power rankings. The process is almost always the same: a panel of writers watches the games, discusses who looked good and bad, and votes. Some outlets use a single author. Others aggregate across a staff. Either way, the methodology is subjective.
These rankings have predictable failure modes:
None of this means media rankings are useless. They capture some genuine signal about team quality. But they mix that signal with noise from cognitive biases that analytics-based approaches can filter out.
Analytics rankings replace human voting with statistical models. The two dominant frameworks in NFL analytics are EPA composites and Elo ratings. They solve different problems.
Measure team quality by aggregating Expected Points Added per play across offense, defense, and special teams. Process every play from the season, weight recent games more heavily, and produce a composite score. The output is a rate stat that directly measures on-field efficiency. Full EPA explainer.
Chess-style rating system adapted for football. After each game, the winner gains Elo points and the loser drops, with the magnitude determined by the margin of victory and the pre-game rating gap. FiveThirtyEight popularized NFL Elo. Simple to compute but ignores play-level detail.
Other approaches include DVOA (Football Outsiders), which adjusts EPA-style efficiency for opponent strength and situation, and Bayesian team ratings that model team strength as a probability distribution with uncertainty. If any of these abbreviations are unfamiliar, the NFL stats glossary defines every metric used across analytics and betting. Most serious models combine multiple signals. The key difference from media rankings: every input is defined, every calculation is reproducible, and the system doesn’t care who played on Monday Night Football.
NoPunt’s power rankings are EPA composites computed from play-by-play data. The process:
The result is a ranking that directly answers: which teams are producing the most expected points per play and allowing the fewest? No opinion. No narrative. Just the play-by-play. You can see the full breakdown and compare any two teams on the head-to-head comparison page.
One of the most useful signals in power rankings is movement. Not week-to-week (that’s noisy), but year-over-year. A team whose offensive EPA/play jumped from -0.05 to +0.12 between seasons made a real structural improvement, usually tied to a coaching change, quarterback upgrade, or offensive line overhaul.
NoPunt’s team pages show EPA trajectories across the season. When a team’s 8-game rolling EPA diverges sharply from their season-long EPA, it signals a team getting meaningfully better or worse. These inflection points are where prediction models gain their biggest edge over static rankings.
Early-season rankings carry extra uncertainty. Through weeks 1-4, teams have only 250-400 plays of data. EPA estimates are noisy and can swing by 0.05 or more on a single game. By week 8, the signal stabilizes. By week 14, EPA composites are highly predictive of playoff outcomes. This is why NoPunt’s model uses multiple rolling windows (24, 33, and 64 games) and a three-model ensemble vote, blending fast-adapting and slow-adapting views of team strength.
Power rankings are an intermediate step, not the final product. They estimate team strength. Predictions take team strength and combine it with game-specific context: home-field advantage, rest days, divisional rivalry effects, weather, and the betting line.
NoPunt’s prediction model uses EPA composites as a core input alongside these game-specific features. When the model’s computed win probability diverges significantly from the implied probability in the Vegas spread, a higher-tier pick emerges. The bigger the gap between the model’s number and the market’s number, the stronger the conviction.
For bettors, analytics-based rankings reveal which teams the market is consistently undervaluing or overvaluing. A team ranked 8th in EPA composites but 15th in media consensus is likely getting better lines than they should. The results page tracks NoPunt’s full pick history so you can verify whether these edges translate to actual wins.
The reliability of any ranking system depends on sample size. Early-season power rankings, whether from analysts or algorithms, are inherently unreliable. Here’s the math:
| WEEK | PLAYS | EPA STABILITY |
|---|---|---|
| Week 2 | ~130 | Very noisy. Single-game variance dominates. |
| Week 4 | ~260 | Patterns emerging but still volatile. |
| Week 8 | ~520 | Signal stabilizes. Rolling windows meaningful. |
| Week 12 | ~780 | High confidence. Trajectory data reliable. |
| Week 17 | ~1,100 | Maximum signal. Playoff projections firm. |
NoPunt handles this by incorporating prior-season data through its rolling windows. The 64-game model reaches back over a full season, giving it a baseline even in week 1. The 24-game model adapts faster to current-season performance. The ensemble vote blends both perspectives, which is why early-season picks are still meaningful but typically lower-confidence (B and C tiers) compared to the higher-conviction calls that emerge mid-season.
No polls. No narratives. Just EPA composites from every play of every game.