An Introduction to Statistical Modeling for Basketball Handicappers
Successful sports bettors can use models to help them make good betting decisions. A well-designed model lets you plug in information to help you predict the outcomes of games and helps you find lines that offer value.
Models come in all shapes and sizes, and the most common ones use statistics. This page is designed to give you an introduction to statistical modeling for basketball games. It can help you build a model for NBA games, NCAA games, and games in any other basketball league around the world.
This page isn’t designed to give you an exact way to beat the sportsbooks. You’re never going to find a successful model in print, because if too many people use the same model, the books adjust their lines to make it unprofitable.
You have to develop and test your own models until you find one or more that work. When you find one that works, you should protect it like it’s gold.
Once you finish reading this page, you’re going to have everything you need to start testing and developing your own statistical models. The great thing about the current age of information is that more statistics are readily available than ever before. You can quickly look up deep statistics on every team and player in minutes.
Once you build a basketball betting statistical model, all you have to do is plug in the relevant stats, and it gives you information you can use to make betting decisions. You can even program a spreadsheet to make everything easier.
I’m going to use the NBA throughout this page, but the exact same process can be used for NCAA basketball and any other league where you can gather enough information.
The Basis for Basketball Modeling
I use a standard base of everything per minute in my basketball statistical modeling. What I mean is that every statistic I use, I break down to a per minute number.
Here’s a list of some of the things I use:
- Points per minute
- Rebounds per minute
- Assists per minute
- Steals per minute
- Fouls per minute
- Blocks per minute
- Free throws per minute
- Shots per minute
I also use goal percentage, three-point percentage, and free throw percentage. I use all of these things and more for individual players and teams, and I separate them for home and away games.
Here’s an example of how I break down a player for statistical modeling. This is using Ben Simmons’ numbers from the 2017-2018 season.
Minutes per game | 33:44 |
FG made per minute | .1988 |
FG attempted per minute | .3647 |
FG % | 5 |
3P made per minute | 0 |
3P attempted per minute | 0 |
3P % | 0 |
FT made per minute | .0712 |
FT attempted per minute | .1245 |
FT % | 0 |
Rebounds per minute | .2401 |
Off rebounds per minute | .0534 |
Def rebounds per minute | .1868 |
Assists per minute | .2431 |
Turnover per minute | .1008 |
Steals per minute | .0504 |
Blocks per minute | .0256 |
Points per minute | .4682 |
Games played / possible | 81/82 |
This is just the overall breakdown for Ben Simmons. You need to continue breaking each player down using the same categories for home and road games.
You should also break down every team, both at home and on the road, using the same categories. Other things to track by team include:
- Team defense including FG % against and points per game allowed
- Team offense including FG % and points per game
- Team three point attempts and percentage
- Team rebounds
- Fouls per game made and drawn
- Performance in the conference and out of the conference as well as division
This seems like a great deal of work, but if you take the time to set up a spreadsheet with the correct formulas, all you have to do is plug in the numbers, and the spreadsheet does all of the hard work. You can even set things up so you can automatically feed the spreadsheet if you know what you’re doing. If you don’t, you can hire someone for a one-time cost to set this up and use it forever.
Once you have all of the players and teams broken down, you can start using the information to build a model.
If every player played every game and never got injured, you could just use the team statistics for your model. But if this happened, the sportsbooks could easily build the same model, and the lines would be even tighter than they are now.
The power this type of modeling gives you is that you build the expected results of an upcoming game based on the predicted starting lineups, and you can play with the expected number of minutes for key players. This isn’t an exact science, and it takes a great deal of work to get good at it, but this is a good thing.
You want your model to be unique, so the chances of other sports bettors using it are small.
Here’s an extended example of how you might build a simple basketball statistical model using the information you’ve learned so far.
The Sixers are hosting the Celtics, so I use the stats for the Sixers home players and the Celtics road players. I take the projected starting lineups and use their minutes per game to start. Then I make any adjustments to the expected minutes for each player based on injury reports, recent rest, and recent actual minutes played.
Once I fill in the expected or predicted minutes from the starting lineup for each team, I fill in the rest of the minutes from the bench players. I use the same type of things to predict which bench players are available and how many minutes they should play.
At the simplest level, you can predict the final score using what I’ve set up so far. You have a predicted number of minutes for each player on each team, so you simply calculate the predicted final score.
You predict Simmons will play 36 minutes instead of his usual 33:44 because the game should be tighter than most because the teams are evenly matched. This means that he’s expected to score 17 points. The actual number is 16.86, and when I run numbers for the entire team and game, I leave it as 16.86, even though I know that he can’t score between 16 and 17 points.
At this point, you have a simple prediction of the final score. But this isn’t really helpful because it doesn’t take anything other than average points per minute for the players expected to play.
The next step is comparing the assists per minute, turnovers per minute, rebounds per minute, blocks per minute, and steals per minute totals for each team based on the minutes you predict each player to play.
When you compare these predictions as a whole for each team, it gives you a better idea of which team has the edge overall. You need to be aware that assists can be somewhat misleading, so I don’t weight the assist comparison as much as the others.
I want to know which team has an edge in overall rebounding as well as offensive rebounding. Steals, blocks, and turnovers are all related, so I compare them as a group between the two teams.
This information helps me make adjustments to the final score prediction. You can keep adding more details and make additional comparisons to fine tune your model.
You also need to use other things that aren’t strictly statistical if you want to be as accurate as possible. I cover some of these things in the non-statistical section below.
Even though it might seem like this example was in depth, it’s just a simple example. In truth, your model needs to start like this example and go into much deeper depth. You can add to it slowly or in a fast manner, but a simple model like this isn’t very powerful.
What you need to learn from this is how to start a model and build out your own. This example is here to show you how to get started and help you understand how statistical models can be used. Start building and testing your models today. It’s the only way to develop one that helps you make money. The next section helps you understand how testing is important and how to do it.
Testing
Every statistical model must be tested as much as possible. You can test them as soon as you build them by using them to predict future games. But you can also test them based on past results. This is called back testing.
Back testing can give you a good idea how a model performs, but it’s never a guarantee of future results. When you back test a system, the further you go back, the less likely the results are accurate. The NBA and major college basketball lines and results are not constant.
What I mean is that every season, month, and game is different. Sportsbooks evolve quickly because they’re designed to maximize their profit. They also adjust lines based on current situations and betting volume. This is why back testing can only tell you so much.
I never back test more than one year. You can experiment with longer times, but I simply don’t trust any results older than a year.
The other issue with back testing is that it can be difficult to find accurate line information and details about what happened before the game. The best thing you can do is start collecting as much information as possible about lines and games immediately so you can use it in the future for back testing. Keep a spreadsheet with lines for every game and make notes for each game including injury details and anything else you use in your model.
The more information and detail you can collect for back testing purposes, the better. You should also keep detailed notes as you test your model moving forward. I’ve made the mistake often of seeing something and saying to myself that I’d remember it.
It’s always safer to make a note. I make notes in my spreadsheet and on paper. I also back up my spreadsheet at least once a week. It’s smart to back it up every day, but I can’t seem to get to that point yet. I do know that it sucks to lose a few days worth of work or more if your computer crashes.
You need to test and test your statistical models from now until you stop betting. Even if you develop the best model in history, you need to keep testing it forever. And you should constantly be working on new models.
When testing models, it’s best to test one change at a time. When you change two or more things, it can be difficult to determine which change altered the results. Even if you change two things and the results don’t change, it doesn’t mean that one thing isn’t positive and the other isn’t negative.
It’s possible that you could add one of the two things and improve your results, but you’ll never learn this unless you test them separately.
If you have two different things you want to test in your model, set up four different models. Leave one model the same, make the second model use one new variable, make the third model use the second new variable, and make the fourth model use both new variables. This way, you have complete information and gather as much information as possible.
It might seem overwhelming to run four different models for two new variables, but once you get everything set up correctly, you should be able to use computer power to crunch all of the numbers.
It’s possible to run hundreds of different variables across your model using spreadsheets, computer power, and some simple programming. Just make sure you keep detailed notes with each model variation, so you can keep everything straight.
As model variations fail, replace them with new ones. Develop multiple profitable models and then test combining parts of them to keep improving. This is a numbers game at the core, and if you keep testing and improving, you can build a profitable statistical model.
Non-Statistical Additions
While some basketball bettors try to design a statistical model that relies 100% on numbers, I’ve never been able to completely perfect one. My guess is that most others haven’t been able to, either. And there’s a good reason for this, which you’re getting ready to learn.
You’ve developed a statistical model and are getting ready to make a bet on a game between the Houston Rockets and San Antonio Spurs. It’s an hour before the game, and James Harden is having tightness in his back and isn’t going to play.
Everything in your model has assumed that Harden is going to play. Depending on the complexity of your model, you might be able to quickly change the input to remove Harden from the game, but there’s a big problem with this.
Harden is responsible for such a large percentage of Houston’s offense that simply removing him from the model isn’t going to give you an accurate picture of the expected results. You can build a model that predicts results over an entire game based on numbers per minute like I suggest, but the truth is that you have no way of knowing how well the predictions will do when the model tries to replace Harden’s minutes with someone else.
This is somewhat hard to explain, so bear with me for a minute. Whichever player takes Harden’s spot in the lineup can’t equal his results because he’s one of the top scorers in the league. In addition, the backup’s current numbers per minute have been created mostly with Harden in the lineup. Some of the minutes have come with Harden on the floor, and some have come while Harden takes a break and the defense may be worn down from guarding him.
When an impact player like Harden is a scratch, I simply don’t bet on the game. The purpose of a statistical model isn’t to help you bet on every game on the schedule. It’s designed to help you identify some games on the schedule that offer value on the offered betting lines.
Another area to consider is teams that are playing the second game of back-to-back days. Though you can build this into your model, sometimes you don’t have enough statistical data to make it valuable. This should also be broken down by home and away, but with the current NBA schedule, you don’t have a great statistical data set until later in the season.
Everyone knows that teams on the second game of a back-to-back schedule don’t perform as well, but how much drop-off can be expected? You can use historical data for this, but every season is different, and every team is different.
You also need to account for how rested the main players are from the previous game. Did the main players have to fight through four periods and overtime in their last game, or was it a blowout where they all rested after three periods?
It’s difficult to predict streaks in basketball, but there’s no doubt that some players and teams are more likely to go on a streak than others. Some shooters get hot and stay on fire for a week or more, while at other times they might go several games with poor results.
Statistics do a good job of showing averages of results, but they don’t show the smaller streaks. Some advanced baseball statistical models weigh a starting pitcher’s last start and last three starts more heavily than earlier starts. It’s quite complicated, but you might consider trying the same thing when building your basketball model.
Though most basketball bettors don’t use it in their models, I always look at how far teams need to travel and how many time zones they cross to play on the road. When the Lakers travel across town to play the Clippers, it’s a great deal different than traveling across the country to play the Celtics.
Sadly, there are hundreds of little things that a statistical model can’t predict. Injuries are always a big part of betting on basketball, but they aren’t the only thing.
The best basketball bettors that use a statistical model also combine the predictions with as many other considerations as possible. In other words, a statistical basketball betting model is only one tool for you to use. If you want to be a winning basketball bettor, you need to use as many tools and as much information as possible.
Conclusion
Building a statistical basketball betting model isn’t easy. But if you can build one or more successful ones, it can help you win a great deal of money. Use everything you learned on this page and start building and testing a model or two right now.
Track your results, back test your model, and keep improving it as you learn more. Keep testing new variables using the power of computers, spreadsheets, and programming to improve your results. It might take several months or even years to develop a solid model. But all of the work will be worth it when you start seeing winning results.
The sportsbooks get smarter, and the lines get tighter. This means that you’re never going to be done with your model. This doesn’t mean you can’t build one that works for years, but you always need to be looking for another edge and adjusting along with the sportsbooks.