## Some central limit theorems pertinent to estimating the effectiveness of information retrieval

**A 2006 Preprint by
É. Czabarka and
J. Spouge
**

- 2006:15
An urn contains white and black balls numbered arbitrarily from 1 to $d$ . The tendency of white balls to have lower numbers than black balls can be measured by a “$ROC _ n$ score”. Now, let each ball have not one but two numbers on it, in red and green. Sample the balls with replacement from the urn into a “bag”, and consider the difference between the $ROC _ ns$ in the bag for the two sets of numbers, red and green. The $ROC _ n$ difference has an approximate normal distribution, with mean equaling the difference in the urn. Now, bootstrap by sampling balls with replacement from the bag into a “sack”, and consider the $ROC _ n$ difference in the sack. Again, the difference has an approximate normal distribution, with mean equaling the difference in the bag. Moreover, the difference has approximately the same variance in the bag and sack, the condition required to justify bootstrap inferences about sampling from the urn. The results have practical relevance, because researchers use the $ROC _ n$ score to measure the efficacy of database retrieval. They then bootstrap the database to assign a P-value to the $ROC _ n$ difference between two retrieval algorithms.