| category Mathematics  | tag Statistics

In a set of n randomly chosen people, some pair of them will have the same birthday. By the pigeonhole principle, the probability reaches 100% when the number of people reaches 367. However, 99.9% probability is reached with just 70 people, and 50% probability with 23 people based on the assumption that each day of the year is equally probable for a birthday.

### Calculating the probability

x <- rep(NA, 100)
y <- rep(NA, 100)
p <- rep(NA, 100)
x[1]=1
y[1]=1
p[1]=0
for(i in 2:100)
{
x[i]=i
y[i]=y[i-1]*(365-i+1)/365
p[i]=1-y[i]
}
dat = data.frame(numOfIndiv=x, prob=p)
dat2370 = dat[dat$numOfIndiv==23 | dat$numOfIndiv==70,]
dat2370$prob <- round(dat2370$prob, digits=3)


### Plot the probability

library(ggplot2)
ggplot(dat, aes(x=numOfIndiv, y=prob)) +
geom_line() +
xlab("Number of Individuals") +
ylab("Probability of Have Two Individuals with the Same Birthday") +