Uniform Numbers Across Sports

Sep 24, 2019 00:00 · 774 words · 4 minute read sports r dataviz

A recent episode of Deadspin’s Deadcast podcast raised the question: “What is the best number to wear in any sport?” So I scraped several sports databases for their historical player numbers.

For our purposes, player counts represent distinct player-season-number combinations. So a player wearing #1 for 10 years gets counted 10 times. If you change numbers mid-season, you’re counted twice for that season. If you’re a Chris Gatling type and get traded multiple times in a season, we’re only counting you once for that season if you don’t change numbers.

library(tidyverse)
number_totals <- function(df) {
  ndf <- df %>% 
    distinct(player, year, number) %>% 
    group_by(number) %>% 
    summarise(n=n_distinct(player, year)) %>% 
    ungroup()
  
  # some numbers have never been used.
  missing_numbers <- setdiff(as.character(1:99), ndf$number)
  if (length(missing_numbers)) {
    missings <- data.frame(number=missing_numbers)
    missings$n <- 0
    
    ndf <- rbind(ndf, missings)
  }
  # function to numerically sort strings
  numsort <- function(v) { str_sort(v, numeric=T)}
  ndf %>% mutate(number=factor(number))  %>% mutate(number=fct_relevel(number, numsort))
}

plot_numbers <- function(df) {
  ticks <- c('0', '1', '10', '20', '30', '40', '50','60','70','80','90')
  df %>% 
    number_totals() %>%
    ggplot(aes(x=number, y=n)) + 
      geom_bar(stat='identity') + 
      scale_x_discrete(breaks=ticks) + 
      ylab("Player Seasons") + 
      xlab("Jersey Number") + 
      theme_classic()
}

Basketball

basketball-reference.com has NBA jersey numbers going back to the 1946-47 season (technically BAA jersey numbers in those years, since it wasn’t the NBA yet.)1

nba <- read_csv("nba.csv")
## Parsed with column specification:
## cols(
##   player = col_character(),
##   year = col_double(),
##   number = col_character()
## )
plot_numbers(nba)

In the NBA we see a pretty chunky pattern. NBA players seem to prefer jerseys ending in digits 0-5. As it turns out while the NBA doesn’t have any rule requiring jersey numbers to be chosen this way, the NCAA does. Players want to keep their old numbers that they are comfortable with, so the NBA ends up reflecting that. The relative lack of numbers greater than 55 is a reflection of the same rule.

Football

pro-football-reference.com gives us player numbers as far back as 1947. So let’s take a look.

nfl <- read_csv("nfl.csv")
## Parsed with column specification:
## cols(
##   player = col_character(),
##   year = col_double(),
##   number = col_character()
## )
plot_numbers(nfl)

The NFL has the strongest rules about uniform numbers. Blocks of numbers are reserved for different positions, allowing the refs to know who is eligible to do what during a play. Thats why we see the pretty broad spread between 1 and 99. 0 and 00 are both rare though.

Baseball

baseball-reference.com gives us uniform numbers going back to 1923.

mlb <- read_csv('mlb.csv')
## Parsed with column specification:
## cols(
##   player = col_character(),
##   year = col_double(),
##   number = col_character()
## )
plot_numbers(mlb)

Baseball is a game of traditions. One of those is that players prefer lower numbers, leaving higher numbers for minor league players. There’s not a lot of players wearing unlucky #13 here. #47 was retired in the 1990s league-wide in honor of Jackie Robinson, explaining that dip.

Hockey

Lets look at hockey. hockey-reference.com has player numbers for the NHL going back to the 1950-1951 season.

nhl <- read_csv('nhl.csv')
## Parsed with column specification:
## cols(
##   player = col_character(),
##   year = col_double(),
##   number = col_character()
## )
plot_numbers(nhl)

NHL Players tend to prefer lower numbers. Number 1 is usually reserved for goalies, and the next few numbers for defensemen.2 Unlucky #13 is usually not picked. NHL players do not use #0 or #00 generally.

Which fits best?

nba2 <- nba %>% 
  distinct(player, year, number) %>%
  group_by(number) %>% 
  summarise(n=n_distinct(player, year)) %>% 
  mutate(sport='NBA')

nfl2 <- nfl %>% 
  distinct(player, year, number) %>%
  group_by(number) %>% 
  summarise(n=n_distinct(player, year)) %>% 
  mutate(sport='NFL')

mlb2 <- mlb %>% 
  distinct(player, year, number) %>%
  group_by(number) %>% 
  summarise(n=n_distinct(player, year)) %>% 
  mutate(sport='MLB')

nhl2 <- nhl %>% 
  distinct(player, year, number) %>%
  group_by(number) %>% 
  summarise(n=n_distinct(player, year)) %>%
  mutate(sport='NHL')

numsort <- function(v) { str_sort(v, numeric=T)}
 
sports <- rbind(nba2, nfl2, mlb2, nhl2) %>% mutate(number=factor(number))  %>% mutate(number=fct_relevel(number, numsort))

number_counts <- sports %>% group_by(number) %>% summarise(n=sum(n)) %>% ungroup() %>% arrange(desc(n))

ticks <- c('0', '1', '10', '20', '30', '40', '50','60','70','80','90')
ggplot(sports, aes(x=number, y=n, fill=sport)) + geom_bar(stat='identity') + theme_classic() + scale_x_discrete(breaks=ticks) + xlab("Uniform Number") + ylab('Player-Seasons')

Ignoring league-size effects, the most common number is 22. Anything over 70 is just an NFL number. Any double-digit number where the ones digit is greater than 5 outs you as a non-basketball player. Combine that with the MLB and NHL preference for smaller jersey numbers, and the 20s are your best bet.


  1. They also have several players with uniform numbers with leading zeros playing for the Rochester Royals in the 1950s. I was unable to figure out whether these were data entry errors or real distinctions. I left them out.

  2. This appears to be more of an unwritten rule though.

tweet Share