# Information and the Two Child Problem

Last year (wow…time flies), I posted a solution to the Two Child problem using Bayes theorem. If you are unfamiliar with this problem, you may want to read that post first.

There has continued to be discussion on this topic on the LinkedIn group where I was originally introduced to it. One of the comments to my previous post summarizes many of the issues that were brought up.

Let me summarize:

One of the objections is that you have to consider why the person is providing this information. Knowing something about why you were provided the information you have and why you were not provided information you don’t have can certainly be used to improve your inferences.

However, without this additional information, you have two choices: (1) do nothing but worry about what would happen if you did have the additional information, or (2) proceed to make an inference based on the information you *do* have.

Many times when solving a problem, I wish that I had more information. But then I am reminded of the words of my grandmother, “Wish in one hand and spit in the other, and see which gets full first!”

When solving an inference problem, you must use only the information at hand—not the information you want (sounds like a Donald Rumsfeld quote). In the current problem, you are simply informed that “one of the children is a boy”. You do not know which child the person is referring to, and you do not know why you were provided with that information.

As described above, there are two ways to proceed with this problem. Solve the problem at hand, which is what I illustrated previously using Bayes Theorem. Or expand the problem by considering why you were provided the information that you have, rather than some other information. If the information came to you by means of a person informing you, you might consider the intention or motivation of that person. If you know something about that person already, this knowledge could be helpful. However, this changes the problem.

Keep in mind now that this is a puzzle. And one (of the many) purpose(s) of a puzzle is to educate. By changing the problem, you lose the opportunity to learn something. So let’s focus on the puzzle that was posed and resist temptation to consider variants. In my own personal experience, I find such temptations to indicate the fact that I don’t know how to solve the puzzle that was presented to me.

What I showed in the previous post was that if you know that a person has “two children at least one of whom is a boy” then the probability is 1/3 that both are boys; whereas if you know that a person has “two children one of whom is a boy born on a Tuesday” the probability that both are boys changes to 13/27.

What we learn is that *any* information about the child improves your inference as to whether both children are boys. I found this to be quite shocking, especially since the information that the child was born on Tuesday is at first glance irrelevant.

In the solution using Bayes theorem, you can see that this information about the day the child was born *does* indeed affect your inferences by coming in via application of the sum rule where you subtract off a term that involves the possibility that both children were boys born on Tuesday. This makes sure that you do not double-count that particular case.

In fact, the more unlikely that case (both being boys born on Tuesday), the more distinguishable the children are from one another.

One can now easily see that *any* information that enables you to distinguish one child from the other will improve your inference. Let’s take a closer look at the problem from this perspective by considering several different states of knowledge and the probabilities that a rational agent would infer that both children are boys. I will denote this probability by Prob(BB) and leave proofs to the reader.

“I have two children” Prob(BB) = 1/4
“I have two children at least one of whom is a boy” Prob(BB) = 1/3
“I have two children at least one of whom is a boy born on a Tuesday” Prob(BB) = 13/27
“I have two children, and here in front of you is my son Bob” Prob(BB) = 1/2

In the least informed case, you only know that the person has two children, so the probability that you would expect both to be boys is 1/4. In the most informed case, where you meet one of the children, the probability that both are boys is equal to the probability that the other un-met child is a boy (1/2). As you start from the information that there are two children and learn more and more about the children, the probability that both are boys changes from a minimum of 1/4 to a maximum of 1/2.

Focusing on the puzzle rather than being distracted by all the other puzzles you could have been solving enables you to learn something from it. Here we find that this puzzle is about the problem of distinguishability versus indistinguishability. And we see that it is not at all trivial.

## 4 comments for “Information and the Two Child Problem”

1. JeffJo
October 19, 2011 at 5:31 pm

The question you asked, “What is the probability I have two boys?”, is ambiguous if you are not given enough information, in the statement leading up to it, to define the sample space for that probability. So your first comment about my reply is a non sequitur. Your second, “You do not know which child,” is frequently used to dismiss a naive solution that also leads to the answer of 1/2. But that was not my solution; I never inferred a specific child. I got the same value as those who do, but by a different method.

The information at hand is that a certain fact applies to a random family. So that fact must necessarily apply to every family we count as a possibility. But nothing about the presentation of the fact makes it a sufficient condition; that every family the fact applies to must be counted this way. And the precedent for puzzles (and it’s true in real life) says we shouldn’t. That precedent is set by the Game Show Problem, and the Three Prisoner’s Problem which is mathematically and logically equivalent. You get the wrong answer for them by following your advice. Let’s look at the latter:

The governor has decided to pardon one of Tom, Dick, or Harry. She will announce who at noon tomorrow. Overnight, Tom finds out that the warden knows who will be pardoned. The warden won’t tell Tom about his own fate, but does tell him that Dick will not be pardoned. Tom thinks his chances for the pardon have increased, from 1 in 3 to 1 in 2. Is he right? The generally accepted answer is that he is not.

Tom is using only the fact immediately at hand to make his assessment. There were three possibilities originally, and one was eliminated by the one fact presented to him. Of the two remaining, he gets pardoned in one. This is the exact same logic you are using about families, and it is incomplete. If Tom has been chosen to receive the pardon, there would be two statements of the form “‘Name’ will not be pardoned” that the warden could have used. The only way for Tom to get the accepted answer, that his chances are still 1 in 3, is for him to infer more than just the “fact at hand.” He must assume that the warden chooses randomly between whatever set of statements are available to her, that fit the implied form.

The precedent is also set by Bertrand’s Box Paradox, which is mathematically equivalent to those two puzzles but logically closer to the Two Child Paradox. I’ll paraphrase it to show how: If I say I have two children, what is the probability they share the same gender? That seems easy: 1/2. But what if I give you a sealed envelope and tell you that one gender of a child (not necessarily of a specific child) is written inside it? If you open it and read “boy,” you may be tempted to say the probability has changed to 1/3. But if you are tempted that way, you must say the same thing if you read “girl.” And if the probability is 1/3 regardless of what I wrote, you don’t need to open the envelope at all. The Law of Total Probability says the probability must be 1/3 even without the envelope.

This apparent paradox can only be resolved only if you recognize that if my children do not share the same gender, whatever fact “is at hand” – that is, in the envelope – had to have been chosen by a method that writes “boy” with probability P, and “girl” with probability 1-P. The question cannot be answered without assigning a value to P. You tacitly assumed P=1, but there is no more justification for that than Tom assuming the same thing. If the statement does not provide a value, you can only assume P=1/2.

And the only way additional information can change the answer, is if you *know* that information was part of a sufficient condition used to identify families. If I ask you whether one of your two children is a boy who was born on a Tuesday, and you say yes, the probability both are boys is indeed 13/27. But if you volunteer that information without being prompted, the probability based on the information at hand plus required inferences is 1/2. Because you might have said “one is a girl born on a Thursday” when you also had a Tuesday boy. This is the teaching point that must be stressed in all the of these problems; that a “fact at hand” is only a necessary condition and you must determine a sufficient condition as well. This has been known since 1889 when Bertrand published “Calcul des probabilities”, and was re-iterated by Martin Gardner when he popularized the Two Child Paradox (Scientific American, October 1959).

And if we really want to get pedantic, when you say “I have two children, and here in front of you is my son Bob,” the probability you have two boys is slightly less than 1/2. If we assume “Bob” is a common name, that is. The reason is because you won’t have two children named Bob. It is easiest to see how, if we consider only three possible boys’ names: Andy, Bob, and Carl. All else being equal, say that “Bob” is twice as likely as either “Andy” or “Carl,” which are equally likely. The probability you will name a child “Bob” is 1/2 for your first son, but drops to 1/3 for your second. So Prob(Older Bob with a sister)= Prob (Older Bob with a brother)= Prob (Younger Bob with a sister)=1/8, but Prob (Younger Bob with a brother)=1/12. That makes the answer Prob(BB|Bob)=5/11.

The same trend applies to any common name, but the difference from 1/2 is much less with realistic distributions. So no, I really don’t think you should go into that detail; I mention it to point out that applying my reasoning from above means you would mention your other child half the time in each situation. That term will cancel out of the calculation, which is why you frequently can get away with ignoring the difference between necessary and sufficient.

2. October 21, 2011 at 2:10 am

Thank you for your latest comment. I believe that I now better understand the point you are making… and it is a good one.

There has indeed been much discussion in the literature about the ambiguity of the statement, “I have two children at least one of whom is a boy”. Some of this ambiguity originates from the inappropriate (and somewhat antiquated) idea that this refers to a family at random rather than a particular family. There is no sample space. There is a single family about whom you are making an inference.

The remaining ambiguity arises from potential additional assumptions that are made about the problem. The point you make is that is matters as to whether you had asked for that information or whether the person simply offered it since in that case they had a choice as to which information to offer. This is a very good point, which has been discussed in the past (as you mention) as well as more recently.

Certainly, if you know the motivation of the person who offered the information, then this must be taken account to perform your inferences. Furthermore, if there is a motivation, then it is necessary information.

So the problem as typically worded has two important elements:
1. How did you acquire the information or what was the motivation of the person providing the information?
2. How can additional information, such as the fact that the boy was born on Tuesday affect your inferences?

My intention was to focus on the second question, since it is surprising that in some situations the fact that the boy was born on Tuesday actually matters. The result is that it makes the children more distinguishable.

However, the question remains as to how to solve the problem when you do not possess this information. Your point is well taken that if there is some motivation, and you do not account for it then you will get garbage answers. But this is inference, and one does not always know that one is missing necessary information. So one does the best one can.

I can show that the solution that gives P(BB) = 1/2 has more information content (and therefore assumes more) than the solution that gives P(BB) = 1/3. This is performed simply by looking at the entropy.

Given the fact that you are told “I have two children at least one of whom is a boy”, the solution you propose leads to P(BB) = 1/2. I think that you will agree that since you know there is at least one boy P(GG) = 0. This leaves P(BG) which I will take as the probability that the oldest child is a boy and the youngest is a girl, and P(GB) which is similarly defined. Without additional information, you must assign equal probabilities so that P(BG) = P(GB). Since they must sum to 1/2, this gives the following probability distribution across the possible states of the sex of the two children:
P(GG) = 0
P(BG) = 1/4
P(GB) = 1/4
P(BB) = 1/2
The entropy of this distribution is -1/4 log 1/4 -1/4 log 1/4 – 1/2 log 1/2 = 1.0397 nats = 1.5 bits

The solution I provide results in P(BB) = 1/3 and P(GG) = 0, which leaves P(BG) = P(GB) = 1/3 so that:
P(GG) = 0
P(BG) = 1/3
P(GB) = 1/3
P(BB) = 1/3
which has an entropy of -1/3 log 1/3 -1/3 log 1/3 – 1/3 log 1/3 = 1.0986 nats = 1.585 bits

The distribution I arrive at is uniform over the remaining possibilities, which means that it contains less information than the result you suggest (0.085 bits of information less to be precise). Your solution assumes something about the motivation of the individual who provided the information, whereas my solution makes no such assumption. If there indeed was some motivation, then your results will be more accurate, and if not, mine will be more accurate. However, in the case where one does not know whether we were offered the information for a reason, it generally assumed that it is safer to assume less.

3. JeffJo
October 21, 2011 at 11:58 am

Applying and interpreting information theory correctly is a tricky business. You can do it wrong without recognizing how, and get a wrong answer. For example, if you apply it the same way you did to the Three Prisoners Problem, the incorrect answer that Tom and Harry each have a 1/2 chance for the pardon yields an entropy of -(1/2)*log(1/2)-(1/2)*log(1/2)/2=1 bit. The correct answer yields -(1/3)*log(1/3)-(2/3)*log(2/3)/2=0.918 bit, which is less. So should the answer be that Tom’s chances have improved? No, it shouldn’t.

Since the question asks for a conditional probability, you need to use conditional entropy. What you did was treat the posterior probability when you know there is a boy as though it was the prior probability. Instead, you should use the joint distribution representing both family type, and whatever fact you learn. If T is the random variable representing family type, and F is the random variable representing the fact you learn, H(T|F)=sum(P(ti,fj)*log(P(fj)/P(ti,fj)) is the entropy of T given that you know what F is, which is our case. Representing “at least one is a boy” by ALOB, and whatever else you would know in its place by OTHER, the correct entropy for your solution to the Two Child Problem looks like this:

P(ALOB)=3/4
P(OTHER)=1/4
P(BB,ALOB)=1/4, P(BB,OTHER)=0
P(BG,ALOB)=1/4, P(BG,OTHER)=0
P(GB,ALOB)=1/4, P(GB,OTHER)=0
P(GG,ALOB)=0, P(GG,OTHER)=1/4
H(T|F)=(3)*[(1/4)*log((3/4)/(1/4))] + (1)* [(1/4)*log((1/4)/(1/4))]
= 1.189 bits.

But if you do it my way:

P(ALOB)=1/2
P(OTHER)=1/2
P(BB,ALOB)=1/4, P(BB,OTHER)=0
P(BG,ALOB)=1/8, P(BG,OTHER)=1/8
P(GB,ALOB)=1/8, P(GB,OTHER)=1/8
P(GG,ALOB)=0, P(GG,OTHER)=1/4
H(T|F)=(2)*[(1/4)*log((1/2)/(1/4))] + (4)* [(1/8)*log((1/2)/(1/8))]
= 1.5 bits.

Doing the same thing with the Three Prisoners Problem yields 0.667 bits for the incorrect answer, and 0.918 for the correct one.

How additional information, like “born on Tuesday,” affects the answer is trivial to see once you understand the importance of the method used to determine that information. If the information is required to apply to any family you consider, then a family with two boys will be a little less than twice as likely to meet the requirement, compared to a family with only one. How much less than twice is determined by the likelihood the information applies to any one random boy. The less likely it is, the closer the factor is to 2. But if the information you learned was chosen at random from what is either one or two facts in the appropriate form (this is how I avoid picking a specific child), the probability of having two children with the gender included in that fact is always 1/2.

4. JeffJo
December 26, 2011 at 12:50 pm

One problem with probability puzzles, is that you can apply either intuitive solutions – like “if we know the older child is a boy, the younger’s child gender is independent so the probability of two boys is 1/2” – or mathematical solutions – like “Only BG and BB remain possibilities if we know the older child is a boy; since each possibility started with probability 1/4, the conditional probability of BB is (1/4)/(1/4+1/4)=1/2.” Sometimes the two get mixed, with intuitive elements getting put into a more mathematical approach somewhere in the middle. The problem comes in because it is nearly impossible to argue with someone that what appears to them to require no justification is unjustified. That is, the intuitive.

Unfortunately, for supposedly “simple” probability problems, people prefer to use solutions that are intuitive to some degree. This is what ultimately causes controversy. You not only have to present a correct solution that the other person will accept, but also a reason why the intuitive elements they used were wrong. The latter is nearly impossible to do, and once you fail they will claim something intuitive in your solution must be wrong.

I recently formulated a new argument that I hope will convince you that your intuitive arguments, which you apply when you do not possess the information about the motivations behind your knowledge, are incorrect. It involves a slight twist on another controversial problem called the Monty Hall Problem.

Suppose you are on a game show, and are offered the choice of three doors. Behind one is a brand new car. Behind the other two are goats; one goat is black, and one is white. You choose a door, but instead of opening it the host opens a door you didn’t choose, revealing the black goat. He then offers to let you switch your choose to the remaining closed door. Should you?

This problem is normally presented without mentioning the color of the goats. Most people initially (and intuitively) feel there is no benefit to switching, but more careful (and still intuitive) analysis shows that there should be a 2/3 chance of winning if you switch. Your original choice should win 1/3 of the time whether or not the host reveals a door, so it has to remain 1/3 when he does. All of the remaining probability must shift to the other closed door.

But what if we apply the mathematical analysis that produced the 1/3 and 13/27 answers to the Two Child Problem? There are initially six combinations for the three prizes: CBW, CWB, BCW, WCB, BWC, and WBC; where C is the car, B and W are the goats, and the order is your door, the host’s door, and the remaining door. All are equally likely to exist initially. But once “we know” that there must be a B in the second position, only CBW and WBC remain possibilities. The answer seems to be 1/2. And since it will also be 1/2 if we see a white goat, it must be 1/2 regardless of what color goat we see. So by “knowing” the color of the goat, the probability changes from 1/3 to 1/2.

But this is an incorrect solution. It used the intuitive-seeming, but logically invalid, proposition that knowing the position of the black goat is equivalent to reducing the set of possibilities to all those where the black goat is in that position. The correct mathematical solution allows the host to choose to reveal the white goat in the CBW case half of the time, making the probability 2/3 as we should expect.

This modification is exactly parallel to the original Two Child Problem, which you claim has the answer 1/3. That answer is based on a similarly invalid proposition, that knowing one child is a boy is equivalent to reducing the set of possibilities to all two-child families that include a boy. That is the problem with the intuitive portion of your answer. You accused me of supposing a driving motivation behind what you know, but in fact it is your assumption that assumes such a motive. Mine assumes there is no known motivation, in which cases “knowing there is a boy” and “knowing there is a girl” are equally likely in a mixed family.

And if you measure the information content correctly – by how much is removed by each interpretation of the condition, not by how much is contained in a random process that produces either sample space unconditionally as you did – you will see that assuming a lack of motivation removes less information, and so satisfies Occam’s Razor.