(by Michael Lavine)

In the May 2010 issue of Statistical Science, Bradley Efron wrote an article on The Future of Indirect Evidence. His point is that indirect evidence is all around us, in increasing amounts, and that statistics is adapting, and may have to adapt further, to handle it. In this post, I’d like to use some of his examples to ask whether the distinction between direct and indirect evidence is real. All examples in this post come from Efron’s article.

Example 1. A couple is expecting twins. From a sonogram, they know the twins are both boys. What is the probability they are identical? Efron says that among all twins, Pr[Identical] = 1/3 and Pr[Fraternal] = 2/3. Further, Pr[both boys | Identical]/Pr[both boys | Fraternal] = 2. An easy calculation yields Pr[Identical | both boys] = Pr[Fraternal | both boys] = 1/2. So far, so good. But Efron goes on to remark that the couple “are learning [directly] from their own experience (the sonogram), but also indirectly from the experience of others” [the 1:2 odds ratio of Identical to Fraternal].

Example 2. We know from experience that kidney function decreases with age. A new kidney donor, aged 55, appears. How good are his kidneys? One previous donor in our database was 55 years old. Efron says that he, the other 55 year old donor, provides direct evidence, while the other donors provide indirect evidence, through the regression function, about the new donor’s kidney function.

Here’s what puzzles me. In Example 1, the experience of everyone else — millions of couples who had twins — similarly situated to the couple of interest, is called indirect. But in Example 2, the experience of everyone else — just one person this time — similarly situated to the person of interest, is called direct.

I don’t see the difference. Can anyone enlighten me?

### Like this:

Like Loading...

*Related*

My reading of Efron’s paper showed that the kidney example needs to be read in full to understand his point. 157 volunteers had their kidneys rigorously tested as a function of age, of which only one volunteer was 55. The 55-year-old study subject is contributing direct evidence of kidney efficacy as a function of age alone.

A real-life 55-year-old shows up to donate a kidney. How to determine his kidney efficacy without rigorous testing? Efron points out that the study did a regression using OLS, which effectively incorporated every other subject’s data into one line, something that mimicked the combination of direct and indirect data using Bayes Rule that Efron talks about previously. In effect, the regression line is doing the same job, and using all those non-55 subjects as indirect evidence for what a 55-year-old’s kidneys would look like, because most statisticians would more happily use the OLS for 55 than the actual subject’s number, thereby tacitly combining direct and indirect data.

Isn’t all evidence indirect, in all cases? No measuring device (in physics or statistics or politics) measures exactly what you care about in your mechanistic model. Surveys measure what people respond in surveys. Observations measure what is observable. Even sonograms measure sound reflectivity, which only relates to the properties of babies indirectly, no?

Brad Efron wrote:

Not all the “millions” of twins in the first example were known a priori to be same sex. That’s why some of the evidence is indirect.

The probability ratio you give is 1/2 not 2.

In 1 the probability that the twins are identical is inferred by a probability calculation using Bayes rule and not by counting the number of twin boys that are identical divided by the number of twin boys. If it comes directly from data it is direct. If it is inferred by probability theory it is indirect. It is just that simple. The distinction has nothing to do with how the data is collected or measured.

In my previous comment I took Pr[both boys | Identical]/Pr[both boys | Fraternal] = 2 to be inferred rather than stated. It seems to be correct as written.