Oct 4, 2019

# Introduction

This week's Riddler is a twist on the classic birthday problem. The birthday problem tells us that among a group of just 23 people, we are 50% likely to find at least one pair of matching birthdays. But what if we want to find three matching birthdays instead?

In the U.S. Senate, three senators happen to share the same birthday of October 20: Kamala Harris, Brian Schatz and Sheldon Whitehouse.

How many people do you need to have better-than-even odds that at least three of them have the same birthday? (Again, ignore leap years.)

# Solution

The US Senate comprises a group of 100 individuals. It turns out that for a group of that size, we expect three shared birthdays roughly 64% of the time. For 50% odds, we need a group of 88 people.

# Methodology

The classic birthday problem can be solved with pen, paper, and probability. We consider two possibilities: no shared birthdays, or at least one shared birthday. Because these are the only two possibilities, their odds must sum to 100%. It's often easier to solve for the likelihood of all unique birthdays, then subtract that answer from 100% to solve the problem.

However, with three birthdays the math becomes much more involved. For a group of size $n$, we would have to partition the world into categories like this:

1. $n$ distinct birthdays
2. 1 pair of birthdays & $n-1$ distinct birthdays
3. 2 pairs of birthdays & $n-2$ distinct birthdays
4. 3 pairs of birthdays & $n-3$ distinct birthdays ...

At the extreme, we would have $\frac{n}{2}$ pairs of birthdays (everyone in the group has a birthday buddy). The final option would be that at least three people share the same birthday. We sum the likelihoods of everything except the probability we're interested in, subtract from 100%, and arrive at our answer.

Instead, I opted to solve this problem using simulated trials. By simulating enough groups and measuring how many people are needed to find three matching birthdays, we can estimate the answer. If we simulate one million groups, we can measure the median group size of our samples to find the point at which we cross 50% likelihood.

Python does the heavy lifting this week. The function itself is short, but we rely on a million sampled groups to converge on the correct answer.

import random

def model(n=3, days=365):
size = 0
seen = [set() for _ in range(n)]
while len(seen[-1]) == 0:
i = random.randint(a=0, b=days)
for s in seen:
if i not in s:

Each time we run model(), it returns a random sample of the number of people required to find three matching birthdays. The function is also general enough to search for any number of desired birthdays - for example, the number of people we need for four matching birthdays - though the speed decreases as we search for larger numbers. 