Chapter 4 - Expectation

Question 1

Bobo, the amoeba from Chapter 2, currently lives alone in a pond. After one minute Bobo will either die, split into two amoebas, or stay the same, with equal probability. Find the expectation and variance for the number of amoebas in the pond after one minute. Question
$$ 1 $$ Answer
We have \(3\) events \(E_1 = 0\), \(E_2 = 1\) and, \(E_3 = 2\) each with an equal probability of occuring \(p_1 = \frac{1}{3}\), \(p_2 = \frac{1}{3}\), \(p_3 = \frac{1}{3}\). $$ \sum_{i=1}{3} E_ip_i $$ Explaination

Question 2

In the Gregorian calendar, each year has either 365 days (a normal year) or 366 days (a leap year). A year is randomly chosen, with probability 3/4 of being a normal year and 1/4 of being a leap year. Find the mean and variance of the number of days in the chosen year Question
$$ 365.25 $$ Answer
$$ \dfrac{3}{4} \cdot 365 + \dfrac{1}{4} \cdot 366 = 365.25 $$ Explanation

Question 3

(a) A fair die is rolled. Find the expected value of the roll.

(b) Four fair dice are rolled. Find the expected total of the rolls. Question
A) $$ 3.5 $$ B) $$ 14 $$ Answer
A) $$ \sum_{i=1}^6 \dfrac{i}{6} = 3.5 $$ B) $$ 3.5 \cdot 4 = 14 $$ Don't overthink this. If one die has an expected value of \(3.5\), then \(4\) must have expected value of \(14\). In other words, you just multiply the answer to part A by \(4\). Explanation

Question 4

A fair die is rolled some number of times. You can choose whether to stop after 1, 2, or 3 rolls, and your decision can be based on the values that have appeared so far. You receive the value shown on the last roll of the die, in dollars. What is your optimal strategy (to maximize your expected winnings)? Find the expected winnings for this strategy.

Hint: Start by considering a simpler version of this problem, where there are at most 2 rolls. For what values of the first roll should you continue for a second roll? Question
The best strategy is to stop rolling if your first roll is a 5 or 6,
and stop rolling if your second roll is a 4, 5, or 6

```math X(best_strategy) \approx 4.66\overline{6} ``` Answer
To find our strategy, we need to know what the expected value of the remaining flips is. If the value of the current roll is better than the expected max value of the remaining roll, you should stop rolling and take the money.
$$ E(MAX(2flips)) \approx 4.47 $$ $$ E((MAX(1flip)) = 3.5 $$ Using the above, we should only stop after the first flip if it is greater than 4.47 and after the second flip if it is greater than 3.5
We an calculate the expeced value of this new strategy by finding the probability that we stop on a given roll, multiplied by the expected value of that roll.
let \(p_j\) be the probability that we stop on roll \(j\) and \(v_j\) be the expected value when we stop on that roll.
\(p_1 = \dfrac{1}{3}\)
\(p_2 = \dfrac{2}{3} \cdot \dfrac{1}{2}\)
\(p_3 = p2\)
\(v_1 = \dfrac{5+6}{2}\)
\(v_1 = \dfrac{4+5+6}{3}\)
\(v_1 = \dfrac{1+2+3+4+5+6}{6}\)
which gives us the equation: $$ \sum_{i=1}^{3} v_ip_i = 4 \dfrac{2}{3} $$ Explanation

Question 5

Find the mean and variance of a Discrete Uniform r.v. on 1, 2,...,n. Hint: See the math appendix for some useful facts about sums Question
$$ E(unif(n)) = \dfrac{n+1}{2} $$ $$ VAR(unif(n)) = \dfrac{n^2+1}{12} $$ Answer
finding the expected value)
expected value of a distribution is the mean. There are many ways to calculate the mean of a uniform distribution that starts at \(1\)
$$ \dfrac{\sum\limits_{i=1}^}{n}}{n} = \dfrac{ {n \choose } }{} $$ Explanation

Question 6

Two teams are going to play a best-of-7 match (the match will end as soon as either team has won 4 games). Each game ends in a win for one team and a loss for the other team. Assume that each team is equally likely to win each game, and that the games played are independent. Find the mean and variance of the number of games played. Question
$$ E(X) = 5.8125 $$ $$ V(X) \approx 1.03 $$ Answer
We use the binomial distribution to solve this type of problem because win probabilities are independant.
There is a 'tricks'
We must find the different ways for each team to win \(3\) games out of \(6\), not \(4\) out of \(7\). This is because we already know the winning team wins the last game.
$$ p_4 = {3 \choose 3} p^3 q^0 = .125 $$ $$ p_5 = {4 \choose 3} p^3 q^1 = .25 $$ $$ p_6 = {5 \choose 3} p^3 q^2 = .3125 $$ $$ p_7 = {6 \choose 3} p^3 q^3 = .3125 $$ So the mean is:
$$ \sum_{n=4}^{7} n \cdot p_n = 5.8125 $$ Using the above information, we can find the variance. $$ var(X) = \sum_{i=4}^{7} (i - \mu) \cdot p_i $$ Where \(\mu\) represents the mean found in the previous answer. Explanation

Question 7

A certain small town, whose population consists of 100 families, has 30 families with 1 child, 50 families with 2 children, and 20 families with 3 children. The birth rank of one of these children is 1 if the child is the firstborn, 2 if the child is the secondborn, and 3 if the child is the thirdborn.

(a) A random family is chosen (with equal probabilities), and then a random child within that family is chosen (with equal probabilities). Find the PMF, mean, and variance of the child’s birth rank.
(b) A random child is chosen in the town (with equal probabilities). Find the PMF, mean, and variance of the child’s birth rank. Question
A) $$ E(X) = .15 $$ $$ VAR(X) = 0.472\overbar{2} $$ B) $$ X(X) \approx 1.58 $$ $$ VAR(X) \approx .474 $$ I was surprised that these were different at all, my induition told me they would be the same! Answer
A) we find the probability for each birth rank by multiplying the probability of selecting each family by the proportion of that rank within each family.
$$ P(X) = \begin{cases} \dfrac{1}{3} + \dfrac{1}{6} + \dfrac{1/9} & \text{if} x = 1 \\ \dfrac{1}{6} + \dfrac{1}{9} & \text{if} x = 2 \\ \dfrac{1}{9} & \text{if} x = 3 \end{cases} $$ Which gives us a mean of $$ \sum_{i=1}^{3} i \cdot p_i = 1.5 $$ and a variance of $$ \frac{11}{18}(1-1.5)^2 + \frac{5}{18}(2-1.5)^2 + \frac{1}{9}(3-1.5)^2 = .47\overbar{2} $$ B) all 100 families have a first child, 70 have a second, and 20 have a third.
$$ p(c_1) = \dfrac{100}{190} = \dfrac{10}{19} $$ $$ p(c_2) = \dfrac{70}{190} = \dfrac{7}{19} $$ $$ p(c_3) = \dfrac{20}{190} = \dfrac{2}{19} $$ so our PMF is $$ p(X) = \begin{cases} \frac{10}{19} & \text{if} x=1 \\ \frac{7}{19} & \text{if} x=2 \\ \frac{2}{19} & \text{if} x=3 \end{cases} $$ Which allows us to calculate the mean. $$ \mu = \dfrac{10}{19} + \dfrac{14}{19} + \dfrac{6}{19} \approx 1.58 $$ Now we can use the mean to calculate variance $$ \sum_{i=1}^{3} (i-\mu)^2 p_i \approx .4737 $$ Explanation

Question 8

A certain country has four regions: North, East, South, and West. The populations of these regions are 3 million, 4 million, 5 million, and 8 million, respectively. There are 4 cities in the North, 3 in the East, 2 in the South, and there is only 1 city in the West. Each person in the country lives in exactly one of these cities.

(a) What is the average size of a city in the country? (This is the arithmetic mean of the populations of the cities, and is also the expected value of the population of a city chosen uniformly at random.) Hint: Give the cities names (labels).
(b) Show that without further information it is impossible to find the variance of the population of a city chosen uniformly at random. That is, the variance depends on how the people within each region are allocated between the cities in that region.
(c) A region of the country is chosen uniformly at random, and then a city within that region is chosen uniformly at random. What is the expected population size of this randomly chosen city? Hint: To help organize the calculation, start by finding the PMF of the population size of the city.
(d) Explain intuitively why the answer to (c) is larger than the answer to (a). Question
A) $$ \mu = \dfrac{20,000,000}{10} = 2,000,000 $$ B) Unlike the mean, we need to know the exact population of each city in order to calculate the variance.
C) To solve this, we must know the mean size of a city in each region and the probability of choosing that region $$ P(X) = \begin{cases} \dfrac{1}{4} & \text{if} x = 750,000 \\ \dfrac{1}{4} & \text{if} x = 1,333,333.\overbar{3} \\ \dfrac{1}{4} & \text{if} x = 2,500,000 \\ \dfrac{1}{4} & \text{if} x = 8,000,000 \end{end} $$ So the expected value of the chosen city is $$ \dfrac{750,000}{4} + \dfrac{1,333,333.\overbar{3}}{4} + \dfrac{2,500,000}{4} + \dfrac{8,000,000}{4} = 3,145,833.33\overbar{3} $$ D) $$ In part C, there is a \(\frac{1}{4}\)th chance of selecting a city with \(8,000,000\) people in it, while part A has equal chance of selecting any city.
By putting all the low population cities in 1 region you make them impact the mean less. This is how jerrymandering works. $$ Answer

Question 9

Consider the following simplified scenario based on Who Wants to Be a Millionaire?, a game show in which the contestant answers multiple-choice questions that have 4 choices per question. The contestant (Fred) has answered 9 questions correctly already, and is now being shown the 10th question. He has no idea what the right answers are to the 10th or 11th questions are. He has one “lifeline” available, which he can apply on any question, and which narrows the number of choices from 4 down to 2. Fred has the following options available.

(a) Walk away with $16,000.
(b) Apply his lifeline to the 10th question, and then answer it. If he gets it wrong, he will leave with $1,000. If he gets it right, he moves on to the 11th question. He then leaves with $32,000 if he gets the 11th question wrong, and $64,000 if he gets the 11th question right.
(c) Same as the previous option, except not using his lifeline on the 10th question, and instead applying it to the 11th question (if he gets the 10th question right).

Find the expected value of each of these options. Which option has the highest expected value? Which option has the lowest variance? Question
The expected value is the probability of the event happening times the value of the event. $$ E(walk) = 16,000 $$ $$ E(lifeline10) = \dfrac{1}{2} \cdot 1,000 + \dfrac{1}{2} \cdot \dfrac{3}{4} \cdot 32,000 + \dfrac{1}{2} \dfrac{1}{4} 64,000 = 20,500 $$ $$ E(lifeline11) = \dfrac{3}{4} * 1,000 + \dfrac{1}{4} \dfrac{1}{2} 16,000 + \dfrac{1}{4} \cdot \dfrac{1}{2} \cdot 64,000 = 28,750 $$ using your lifeline on the \(11^{th}\) question leads to the highest expected value, but just walking and taking the $\(16,000\) has the least variance.
Answer

Question 10

Consider the St. Petersburg paradox (Example 4.3.13), except that you receive $n rather than $2n if the game lasts for n rounds. What is the fair value of this game? What if the payoff is $\(n^2\)? Question
Answer
Explanation

Question 11

Martin has just heard about the following exciting gambling strategy: bet $1 that a fair coin will land Heads. If it does, stop. If it lands Tails, double the bet for the next toss, now betting $2 on Heads. If it does, stop. Otherwise, double the bet for the next toss to $4. Continue in this way, doubling the bet each time and then stopping right after winning a bet. Assume that each individual bet is fair, i.e., has an expected net winnings of 0. The idea is that $$ 2^0 + 2^1 + 2^2 ... + 2^n = 2^{n+1} - 1 $$ so the gambler will be $1 ahead after winning a bet, and then can walk away with a profit. Martin decides to try out this strategy. However, he only has $31, so he may end up walking away bankrupt rather than continuing to double his beg. On average, how much money will Martin win? Question
$$ E(X) = (1-.5^5) -31 \cdot .5^5 = 0 $$ Answer
He has a high probability of winning a small amount and a small probability of losing a large amount of money.
$$ p_X(x) = \begin{cases} (1-.5^5) & 1 \\ .5^5 & 31 \\ \end{cases} $$ Explanation

Question 12

Let X be a discrete r.v. with support −n, −n + 1,..., 0,...,n − 1, n for some positive integer n. Suppose that the PMF of X satisfies the symmetry property P(X = −k) = P(X = k) for all integers k. Find E(X). Question
E(X) = 0 Answer
If each positive value has the same probability as it's negative value, there is symmetry about \(0\).
If there is symmatry about \(0\), then the mean will be \(0\). Explanation

Question 13

Are there discrete random variables X and Y such that E(X) > 100E(Y ) but Y is greater than X with probability at least 0.99? Question
Yes, there are many examples of this in real life.
Let \(X\) and \(Y\) be random variables from the following bernoulli distributions $$ X = Bern(.00001) \cdot 1,000,000,000,000 $$ $$ Y = Bern(.99999) $$ We can see that both criteria are fulfilled. Answer

Question 14

Let X have PMF $$ P(X = k) = \dfrac{cp^k}{k} & \text{for} k = 0,1,2... $$ where p is a parameter with 0

This distribution is called the Logarithmic distribution (because of the log in the above Taylor series), and has often been used in ecology. Find the mean and variance of X.

Question

Question 15

Player A chooses a random integer between 1 and 100, with probability pj of choosing j (for j = 1, 2,..., 100). Player B guesses the number that player A picked, and receives that amount in dollars if the guess is correct (and 0 otherwise).

(a) Suppose for this part that player B knows the values of pj . What is player B’s optimal strategy (to maximize expected earnings)?
(b) Show that if both players choose their numbers so that the probability of picking j is proportional to 1/j, then neither player has an incentive to change strategies, assuming the opponent’s strategy is fixed. (In game theory terminology, this says that we have found a Nash equilibrium.)
(c) Find the expected earnings of player B when following the strategy from (b). Express your answer both as a sum of simple terms and as a numerical approximation. Does the value depend on what strategy player A uses? Question
A) The highest expected value would be the higest value of jp_j.
B) Answer

Indicator r.v.s

Question 30

Randomly, k distinguishable balls are placed into n distinguishable boxes, with all possibilities equally likely. Find the expected number of empty boxes. Question
$$ E(X) = n(\dfrac{n-1}{n})^k $$ Answer
The probability of any given bucket being empty after placing a ball in a random bucket is \(\dfrac{n-1}{n}\)
Since we do this for each ball, we raise \(\dfrac{n-1}{n}\) to the \(k^{th}\) power
Lastly, we multiply this all by \(n\) because there are \(n\) buckets
Explanation