Do Free Throws Win Basketball Teams Games?

Introduction

With March Madness upon us (this is being published on Friday, March 22), we are being treated to three weeks of the very best college basketball. Whether it’s a potential first-round upset or a nail-biting Final Four match-up, close games are a staple of March Madness; many argue it’s what makes the tournament so great. Free throws can often be the most decisive factor in what team wins a close game. Or are they? While there are many clichés about the importance of a team making free throws, I take a closer look at whether or not free throws really do win games.

Methods

An article by the NCAA says that in 2018 teams shot, on average, 19.6 free throws and made 14. With these averages in mind, I will be testing these conditions under the null hypothesis by analyzing the relationship between the team wins in an individual season and if they made more than 14 free throws in a game. The goal of this test is to determine the importance, or potential lack thereof, of making free throws.

Data

For this experiment, I have collected data of each regular season game from the 2017-2018 Louisiana State University (LSU) men’s basketball season. The reason for this is LSU made 13.9 per game that season, very close to the overall NCAA average. In this data set, we have opponent faced, free throws made, free throws attempted, and the outcome of the game (Win or Loss).

Rows: 30 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Opponent, Result
dbl (2): FTM, FTA

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# A tibble: 30 × 4
   Opponent            FTM   FTA Result
   <chr>             <dbl> <dbl> <chr> 
 1 Alcorn State         16    24 Win   
 2 Samford              19    20 Win   
 3 Michigan             16    23 Win   
 4 Notre Dame            9    12 Loss  
 5 Marquette            23    26 Loss  
 6 UT Martin            12    20 Win   
 7 UNCW                 17    20 Win   
 8 Houston              17    20 Win   
 9 Stephen F. Austin    23    29 Loss  
10 Sam Houston State    11    20 Win   
# ℹ 20 more rows

lsu_basketball <- lsu_basketball |>
  mutate(ftpct = (FTM / FTA),
         ftm_average = ifelse(FTM > 13, "Yes", "No"),
         ftpct_average = ifelse(ftpct > .713, "Yes", "No"))

view(lsu_basketball)

Along with our initial variable, this code calculates the free throw percentage for each game. And, using the ‘ifelse’ statement within mutate, we can also see whether or not both free throws made and free throw percentage was better or worse than the national average (14 Free Throws Made and 71.4%); “Yes” denotes better than the average, “No” denotes worse than average.

ftm_summary <- lsu_basketball |>
  group_by(ftm_average) |>
  summarize(win_rate = mean(Result == "Win"))

lsu_diff <- ftm_summary[[2]][[2]] - ftm_summary[[2]][[1]]
lsu_diff

[1] -0.06666667

Conveniently, of the 30 regular season games LSU played, they made more than the average 15 times and less than the average 15 times. To calculate the win percentage of games in which they shot above or below the average, we find the mean of wins when grouped by the averages. Following this, ‘lsu_diff’ is the difference between the win percentage when LSU shot above the average and when LSU shot below the average. From the data, we can see that, shockingly, LSU won more games when making a below average quantity of free throws.

ftm_simulation <- vector("double", 5000)
for(i in 1:5000) {
  ftm_summary <- lsu_basketball |>
    mutate(ftm_average = sample(ftm_average)) |>
    group_by(ftm_average) |>
    summarize(win_rate = mean(Result == "Win"))
   ftm_simulation[[i]] <- ftm_summary[[2]][[2]] - ftm_summary[[2]][[1]]
}

view(ftm_simulation)

As we are testing the condition of making free throws under the null hypothesis, the next step in our experiment is running simulations to create a null distribution. This code runs 5000 simulations, each simulation randomizing the difference of a team’s win percentage when making the average and not making the average in a hypothetical season.

ftm_simulated <- tibble(ftm_simulation = ftm_simulation)

ggplot(ftm_simulated, aes(x = ftm_simulation)) +
  geom_histogram(aes(y = ..density..), alpha = .4, fill = "black") +
  geom_density(fill = "purple", alpha = .3) +
  geom_vline(xintercept = lsu_diff, color = "yellow") +
  labs(title = "Free Throws Made Null Distribution", x = "FTM Differential", y = "Density")

Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

p_value <- sum(ftm_simulation >= lsu_diff) / 5000
p_value

[1] 0.7632

The final step is to plot our null distribution alongside our value for LSU’s win differential using a vertical line. When looking at the graph, we can see the vertical line fits right with the null distribution, suggesting that the chance of LSU’s win correlation with free throws is certainly possible under null conditions. We can find a mathematical value for this using the p-value. Seen up above, this is a gigantic p-value that means that more than 75% of the time would we expect to see this win differential under null conditions, meaning we absolutely can not reject the null hypothesis. To interpret this in context of basketball statistics, we can conclude that making more than the average number of free throws in a game may not do much to a team’s probability of winning games.

Other Uses

While free throws made were not much a factor in LSU winning or losing games, we can use the code up above to see if other variables may have a greater impact on determining the results of LSU games by replacing free throws made with a different variable. In this case, I use free throw percentage, and its relation to the national average, to see if that has a greater correlation with game results.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

[1] 0.7986

Results

Similar to the histogram comparing free throw average, we can see that LSU’s differential with free throw percentage also falls right within the null distribution. When looking at the p-value, this is evident by the huge value. This allows us to conclude that making above the national average in free throws and free throw percentage does little to impact the results of college basketball games.

Try it Yourself!

Following my analysis of the 2017-18 LSU season, I was shocked by the lack of relationship between free throw metrics and the outcomes of games. There may be a few ways to explain this; namely, as referenced by the NCAA article, the amount of free throws teams are shooting is decreasing. As basketball has become more spread out, thanks to a revolution of the Three-Pointer led by Stephen Curry, teams are spending less time near the basketball and, thus, decrease their chances of getting fouled. Luckily, we can run tests for any team in any season. To do this, I have created a series of functions (seen below) that hasten the coding process. These functions condense the code and allow for the user to include what ever variables they want, albeit with a few conditions. With these functions, I will walk through an application of how to use these function using data from the 2007-08 University of Missouri (colloquially known as “Mizzou”) men’s basketball regular season.

ftpcts <- function(data, var1, var2) {
  data |>
  mutate(ftpct = {{ var1 }} / {{ var2 }},
         ftm_average = ifelse({{ var1 }} > 13, "Yes", "No"),
        ftpct_average = ifelse(ftpct > .713, "Yes", "No"))
}

ftm_summary_func <- function(data, var1, var2) { 
  ftm_summary <- data |>
  group_by({{ var1 }}) |>
    summarize(win_rate = mean({{ var2 }} == "Win"))
}

ft_histogram <- function(data, xvar, alpha = .4, var) {
  ggplot(data = data, aes(x = xvar)) +
    geom_histogram(aes(y = ..density..), alpha = alpha, fill = "black") +
    geom_density(fill = "blue", alpha = alpha) +
    geom_vline(xintercept = {{ var }}, color = "red") 
    
}

p_value <- function(var1, var2) {
  sum({{ var1 }} >= {{ var2 }}) / 5000
}

Rows: 31 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): team, result
dbl (2): free_throws_made, free_throws_att

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# A tibble: 31 × 4
   team             free_throws_made free_throws_att result
   <chr>                       <dbl>           <dbl> <chr> 
 1 Central Michigan               21              27 Win   
 2 Fordham                         7              10 Win   
 3 Southern                        5               9 Win   
 4 Michigan State                 13              18 Loss  
 5 Maryland                       19              23 Win   
 6 Western Illinois               13              20 Win   
 7 Arkansas                       23              33 Loss  
 8 California                      9              17 Loss  
 9 Purdue                         21              31 Win   
10 McNeese State                   8              10 Win   
# ℹ 21 more rows

mizzou_basketball <- mizzou_basketball |> ftpcts(var1 = free_throws_made, var2 = free_throws_att)

Running ‘ftpcts’ creates three variables: free throw percentage, whether or not Mizzou scored more free throws than the national average in a game, and whether or not their free throw percentage was higher than the national average in a game. Important to note is that, while 10 years earlier than LSU’s season, I have still included the 2017-18 averages when calculating Mizzou’s free throw averages. To change this is easy: simply modify the values in the ‘ifelse’ statements in the functions to better correspond to the season of interest.

ftm_summary <- ftm_summary_func(mizzou_basketball, ftm_average, result)
miz_ftm_diff <- ftm_summary[[2]][[2]] - ftm_summary[[2]][[1]]
miz_ftm_diff

[1] 0.4681818

Using ‘ftm_summary_func’, we calculate the win proportion of when Mizzou scores more or less than the average number of free throws in a game. Mizzou only exceeded the average in 10 of 31 games, but they won nearly every in which they did so; similarly, in the 21 games the Tigers did not score the average, they lost a majority of games. We see this when creating ‘miz_ftm_diff’, which is the difference in win proportions. Unlike LSU, Mizzou’s win differential is large (.468), suggesting that making free throws may have been a more important factor in determining the outcome of their games. One important note in this function is that while “var1” can be any variable, “var2” must denote the result of a game, typically “Win” or “Loss”.

ftstat_simulation <- vector("double", 5000)
for(i in 1:5000) {
  ftpct_summary <- mizzou_basketball |>
    mutate(ftm_average = sample(ftm_average)) |>
    group_by(ftm_average) |>
    summarize(win_rate = mean(result == "Win"))
   ftstat_simulation[[i]] <- ftpct_summary[[2]][[2]] - ftpct_summary[[2]][[1]]
}

view(ftstat_simulation)

Unfortunately, there is no function for the for loop creating random simulations that make up the null distribution, but the code is relatively straightforward. The variable, ‘ftm_average’ is created when running the ‘ftpcts’ function, but the variable ‘result’ may have to be modified if it has a different name in your data set. By running this for loop, we are able to create our null distribution and, using one more function, plot this distribution against our variable of choice and find a p-value.

ftstat_simulated <- tibble(ftstat_simulation = ftstat_simulation)

ft_histogram(ftstat_simulated, ftstat_simulation, alpha = .4, miz_ftm_diff)

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

p_value(ftstat_simulation, miz_ftm_diff)

[1] 0.0132

Results

After running our simulations and creating ‘ftstat_simulation’, we can make these results a tibble, in this case ‘ftstat_simulated’, and then put it into a plotting function. With our tibble (which functions as our dataset) created, we input our x variable of choice, in this case our win differential in each of 5000 simulations, along with our x-intercept line of Mizzou’s win differential. In this example, we can see that, unlike LSU, Mizzou’s win differential barely falls within the distribution, meaning there were few results in the simulation more extreme. When we run the p-value function, we return a value smaller less than 0.05, meaning we can reject our null hypothesis. With this, our model suggests that free throws attempted is a statistically significant variable when influencing the result of an individual basketball game.

Conclusion

The goal of this project was to create a model that tested the importance of making free throws in college basketball while also providing a function that allows for further analysis of this topic. The results of this project show that variability in free throw statistics and game results greatly differ. With this, we can both struggle to make wider conclusions about free throws but also notice potential trends about the trajectory of college basketball. Primarily, as the game of basketball has evolved away from the paint in favor of three-point shots, teams are shooting less free throws than ever. As three-pointers increase their effectiveness in influencing games, the importance of the free throw may be dwindling. However, asking any basketball coach about the importance of free throws will certainly yield a response favoring making shots from the charity stripe.